Abstract
Statistical NL G has largely meant n-gram modelling which has the considerable advantages of lending robustness to NL G systems, and of making automatic adaptation to new domains from raw corpora possible. On the downside, n-gram models are expensive to use as selection mechanisms and have a built-in bias towards shorter realisations. This paper looks at treebank-training of generators, an alternative method for building statistical models for NL G from raw corpora, and two different ways of using treebank-trained models during generation. Results show that the treebank-trained generators achieve improvements similar to a 2-gram generator over a baseline of random selection. However, the treebank-trained generators achieve this at a much lower cost than the 2-gram generator, and without its strong preference for shorter reasations.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 10th European Workshop On Natural Language Generation |
| Place of Publication | Helsinki, Finland |
| Pages | 15-23 |
| Number of pages | 9 |
| Publication status | Published - 1 Jan 2005 |
| Event | Proceedings of the 10th European Workshop On Natural Language Generation - Aberdeen, Scotland Duration: 1 Jan 2005 → … |
Workshop
| Workshop | Proceedings of the 10th European Workshop On Natural Language Generation |
|---|---|
| Period | 1/01/05 → … |
Keywords
- Natural language generation