Extracting parallel fragments from comparable corpora for data-to-text generation

Anja Belz, Eric Kow

Research output: Chapter in Book/Conference proceeding with ISSN or ISBNConference contribution with ISSN or ISBN

Abstract

Building NLG systems, in particular statistical ones, requires parallel data (paired inputs and outputs) which do not generally occur naturally. In this paper, we investigate the idea of automatically extracting parallel resources for data-to-text generation from comparable corpora obtained from the Web. We describe our comparable corpus of data and texts relating to British hills and the techniques for extracting paired input/output fragments we have developed so far.
Original languageEnglish
Title of host publicationProceedings of 6th International Natural Language Generation Conference (INLG'10)
Place of PublicationStroudsburg, PA, USA
PublisherAssociation for Computational Linguistics
Pages167-171
Number of pages5
DOIs
Publication statusPublished - 1 Jan 2010
EventProceedings of 6th International Natural Language Generation Conference (INLG'10) - Dublin, Ireland
Duration: 1 Jan 2010 → …

Conference

ConferenceProceedings of 6th International Natural Language Generation Conference (INLG'10)
Period1/01/10 → …

Cite this

Belz, A., & Kow, E. (2010). Extracting parallel fragments from comparable corpora for data-to-text generation. In Proceedings of 6th International Natural Language Generation Conference (INLG'10) (pp. 167-171). Association for Computational Linguistics. https://doi.org/10.1.1.180.3640