Intrinsic vs. extrinsic evaluation measures for referring expression generation

Anja Belz; Albert Gatt

Intrinsic vs. extrinsic evaluation measures for referring expression generation

Anja Belz, Albert Gatt

University of Brighton

Research output: Chapter in Book/Conference proceeding with ISSN or ISBN › Conference contribution with ISSN or ISBN › peer-review

Abstract

In this paper we present research in which we apply (i) the kind of intrinsic evaluation metrics that are characteristic of current comparative HLT evaluation, and (ii) extrinsic, human task-performance evaluations more in keeping with NLG traditions, to 15 systems implementing a language generation task. We analyse the evaluation results and find that there are no significant correlations between intrinsic and extrinsic evaluation measures for this task.

Original language	English
Title of host publication	Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL'08)
Place of Publication	Columbus, Ohio
Pages	197-200
Number of pages	4
Publication status	Published - 1 Jan 2008
Event	Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL'08) - Columbus, Ohio, USA Duration: 1 Jan 2008 → …

Conference

Conference	Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL'08)
Period	1/01/08 → …

Access to Document

http://aclweb.org/anthology-new/P/P08/P08-2050.pdfLicence: Unspecified

Cite this

@inproceedings{08e890942f884606acab712aef999f16,

title = "Intrinsic vs. extrinsic evaluation measures for referring expression generation",

abstract = "In this paper we present research in which we apply (i) the kind of intrinsic evaluation metrics that are characteristic of current comparative HLT evaluation, and (ii) extrinsic, human task-performance evaluations more in keeping with NLG traditions, to 15 systems implementing a language generation task. We analyse the evaluation results and find that there are no significant correlations between intrinsic and extrinsic evaluation measures for this task.",

author = "Anja Belz and Albert Gatt",

year = "2008",

month = jan,

day = "1",

language = "English",

pages = "197--200",

booktitle = "Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL'08)",

note = "Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL'08) ; Conference date: 01-01-2008",

}

Belz, A & Gatt, A 2008, Intrinsic vs. extrinsic evaluation measures for referring expression generation. in Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL'08). Columbus, Ohio, pp. 197-200, Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL'08), 1/01/08. <http://aclweb.org/anthology-new/P/P08/P08-2050.pdf>

Intrinsic vs. extrinsic evaluation measures for referring expression generation. / Belz, Anja; Gatt, Albert.
Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL'08). Columbus, Ohio, 2008. p. 197-200.

Research output: Chapter in Book/Conference proceeding with ISSN or ISBN › Conference contribution with ISSN or ISBN › peer-review

TY - GEN

T1 - Intrinsic vs. extrinsic evaluation measures for referring expression generation

AU - Belz, Anja

AU - Gatt, Albert

PY - 2008/1/1

Y1 - 2008/1/1

N2 - In this paper we present research in which we apply (i) the kind of intrinsic evaluation metrics that are characteristic of current comparative HLT evaluation, and (ii) extrinsic, human task-performance evaluations more in keeping with NLG traditions, to 15 systems implementing a language generation task. We analyse the evaluation results and find that there are no significant correlations between intrinsic and extrinsic evaluation measures for this task.

AB - In this paper we present research in which we apply (i) the kind of intrinsic evaluation metrics that are characteristic of current comparative HLT evaluation, and (ii) extrinsic, human task-performance evaluations more in keeping with NLG traditions, to 15 systems implementing a language generation task. We analyse the evaluation results and find that there are no significant correlations between intrinsic and extrinsic evaluation measures for this task.

M3 - Conference contribution with ISSN or ISBN

SP - 197

EP - 200

BT - Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL'08)

CY - Columbus, Ohio

T2 - Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL'08)

Y2 - 1 January 2008

ER -