Statistical generation: three methods compared and evaluated

Research output: Chapter in Book/Conference proceeding with ISSN or ISBNConference contribution with ISSN or ISBN

Abstract

Statistical NL G has largely meant n-gram modelling which has the considerable advantages of lending robustness to NL G systems, and of making automatic adaptation to new domains from raw corpora possible. On the downside, n-gram models are expensive to use as selection mechanisms and have a built-in bias towards shorter realisations. This paper looks at treebank-training of generators, an alternative method for building statistical models for NL G from raw corpora, and two different ways of using treebank-trained models during generation. Results show that the treebank-trained generators achieve improvements similar to a 2-gram generator over a baseline of random selection. However, the treebank-trained generators achieve this at a much lower cost than the 2-gram generator, and without its strong preference for shorter reasations.
Original languageEnglish
Title of host publicationProceedings of the 10th European Workshop On Natural Language Generation
Place of PublicationHelsinki, Finland
Pages15-23
Number of pages9
Publication statusPublished - 1 Jan 2005
EventProceedings of the 10th European Workshop On Natural Language Generation - Aberdeen, Scotland
Duration: 1 Jan 2005 → …

Workshop

WorkshopProceedings of the 10th European Workshop On Natural Language Generation
Period1/01/05 → …

Keywords

  • Natural language generation

Fingerprint Dive into the research topics of 'Statistical generation: three methods compared and evaluated'. Together they form a unique fingerprint.

  • Cite this

    Belz, A. (2005). Statistical generation: three methods compared and evaluated. In Proceedings of the 10th European Workshop On Natural Language Generation (pp. 15-23). http://www.ling.helsinki.fi/~gwilcock/ENLG-05/ENLG-05-Proceedings.pdf