Abstract
Building NLG systems, in particular statistical ones, requires parallel data (paired inputs and outputs) which do not generally occur naturally. In this paper, we investigate the idea of automatically extracting parallel resources for data-to-text generation from comparable corpora obtained from the Web. We describe our comparable corpus of data and texts relating to British hills and the techniques for extracting paired input/output fragments we have developed so far.
Original language | English |
---|---|
Title of host publication | Proceedings of 6th International Natural Language Generation Conference (INLG'10) |
Place of Publication | Stroudsburg, PA, USA |
Publisher | Association for Computational Linguistics |
Pages | 167-171 |
Number of pages | 5 |
DOIs | |
Publication status | Published - 1 Jan 2010 |
Event | Proceedings of 6th International Natural Language Generation Conference (INLG'10) - Dublin, Ireland Duration: 1 Jan 2010 → … |
Conference
Conference | Proceedings of 6th International Natural Language Generation Conference (INLG'10) |
---|---|
Period | 1/01/10 → … |