Unsupervised alignment of comparable data and text resources

Anja Belz, Eric Kow

Research output: Chapter in Book/Conference proceeding with ISSN or ISBNConference contribution with ISSN or ISBN

Abstract

In this paper we investigate automatic datatext alignment, i.e. the task of automatically aligning data records with textual descriptions, such that data tokens are aligned with the word strings that describe them. Our methods make use of log likelihood ratios to estimate the strength of association between data tokens and text tokens. We investigate datatext alignment at the document level and at the sentence level, reporting results for several methodological variants as well as baselines. We find that log likelihood ratios provide a strong basis for predicting data-text alignment.
Original languageEnglish
Title of host publicationProceedings of the 4th Workshop on Building and Using Comparable Corpora, 49th Annual Meeting of the Association for Computational Linguistics
Place of PublicationStroudsburg, PA, USA
PublisherAssociation for Computational Linguistics
Pages102-109
Number of pages8
Publication statusPublished - 1 May 2011
EventProceedings of the 4th Workshop on Building and Using Comparable Corpora, 49th Annual Meeting of the Association for Computational Linguistics - Portland, Oregon, USA
Duration: 1 May 2011 → …

Workshop

WorkshopProceedings of the 4th Workshop on Building and Using Comparable Corpora, 49th Annual Meeting of the Association for Computational Linguistics
Period1/05/11 → …

Cite this

Belz, A., & Kow, E. (2011). Unsupervised alignment of comparable data and text resources. In Proceedings of the 4th Workshop on Building and Using Comparable Corpora, 49th Annual Meeting of the Association for Computational Linguistics (pp. 102-109). Association for Computational Linguistics. http://www.mt-archive.info/BUCC-2011-Belz.pdf