Statistical generation: three methods compared and evaluated

Anja Belz

Statistical generation: three methods compared and evaluated

Anja Belz

University of Brighton

Research output: Chapter in Book/Conference proceeding with ISSN or ISBN › Conference contribution with ISSN or ISBN › peer-review

Abstract

Statistical NL G has largely meant n-gram modelling which has the considerable advantages of lending robustness to NL G systems, and of making automatic adaptation to new domains from raw corpora possible. On the downside, n-gram models are expensive to use as selection mechanisms and have a built-in bias towards shorter realisations. This paper looks at treebank-training of generators, an alternative method for building statistical models for NL G from raw corpora, and two different ways of using treebank-trained models during generation. Results show that the treebank-trained generators achieve improvements similar to a 2-gram generator over a baseline of random selection. However, the treebank-trained generators achieve this at a much lower cost than the 2-gram generator, and without its strong preference for shorter reasations.

Original language	English
Title of host publication	Proceedings of the 10th European Workshop On Natural Language Generation
Place of Publication	Helsinki, Finland
Pages	15-23
Number of pages	9
Publication status	Published - 1 Jan 2005
Event	Proceedings of the 10th European Workshop On Natural Language Generation - Aberdeen, Scotland Duration: 1 Jan 2005 → …

Workshop

Workshop	Proceedings of the 10th European Workshop On Natural Language Generation
Period	1/01/05 → …

Keywords

Natural language generation

Access to Document

belz-enlg05.pdfAccepted author manuscript, 101 KBLicence: Unspecified

http://www.ling.helsinki.fi/~gwilcock/ENLG-05/ENLG-05-Proceedings.pdfLicence: Unspecified

Cite this

@inproceedings{5004ee0e1d8b4cbfb3be4d95249219cf,

title = "Statistical generation: three methods compared and evaluated",

abstract = "Statistical NL G has largely meant n-gram modelling which has the considerable advantages of lending robustness to NL G systems, and of making automatic adaptation to new domains from raw corpora possible. On the downside, n-gram models are expensive to use as selection mechanisms and have a built-in bias towards shorter realisations. This paper looks at treebank-training of generators, an alternative method for building statistical models for NL G from raw corpora, and two different ways of using treebank-trained models during generation. Results show that the treebank-trained generators achieve improvements similar to a 2-gram generator over a baseline of random selection. However, the treebank-trained generators achieve this at a much lower cost than the 2-gram generator, and without its strong preference for shorter reasations.",

keywords = "Natural language generation",

author = "Anja Belz",

year = "2005",

month = jan,

day = "1",

language = "English",

pages = "15--23",

booktitle = "Proceedings of the 10th European Workshop On Natural Language Generation",

note = "Proceedings of the 10th European Workshop On Natural Language Generation ; Conference date: 01-01-2005",

}

TY - GEN

T1 - Statistical generation: three methods compared and evaluated

AU - Belz, Anja

PY - 2005/1/1

Y1 - 2005/1/1

N2 - Statistical NL G has largely meant n-gram modelling which has the considerable advantages of lending robustness to NL G systems, and of making automatic adaptation to new domains from raw corpora possible. On the downside, n-gram models are expensive to use as selection mechanisms and have a built-in bias towards shorter realisations. This paper looks at treebank-training of generators, an alternative method for building statistical models for NL G from raw corpora, and two different ways of using treebank-trained models during generation. Results show that the treebank-trained generators achieve improvements similar to a 2-gram generator over a baseline of random selection. However, the treebank-trained generators achieve this at a much lower cost than the 2-gram generator, and without its strong preference for shorter reasations.

AB - Statistical NL G has largely meant n-gram modelling which has the considerable advantages of lending robustness to NL G systems, and of making automatic adaptation to new domains from raw corpora possible. On the downside, n-gram models are expensive to use as selection mechanisms and have a built-in bias towards shorter realisations. This paper looks at treebank-training of generators, an alternative method for building statistical models for NL G from raw corpora, and two different ways of using treebank-trained models during generation. Results show that the treebank-trained generators achieve improvements similar to a 2-gram generator over a baseline of random selection. However, the treebank-trained generators achieve this at a much lower cost than the 2-gram generator, and without its strong preference for shorter reasations.

KW - Natural language generation

M3 - Conference contribution with ISSN or ISBN

SP - 15

EP - 23

BT - Proceedings of the 10th European Workshop On Natural Language Generation

CY - Helsinki, Finland

T2 - Proceedings of the 10th European Workshop On Natural Language Generation

Y2 - 1 January 2005

ER -

Statistical generation: three methods compared and evaluated

Abstract

Workshop

Keywords

Access to Document

Fingerprint

Cite this