Unsupervised alignment of comparable data and text resources

Anja Belz; Eric Kow

Unsupervised alignment of comparable data and text resources

Anja Belz, Eric Kow

University of Brighton

Research output: Chapter in Book/Conference proceeding with ISSN or ISBN › Conference contribution with ISSN or ISBN › peer-review

Abstract

In this paper we investigate automatic datatext alignment, i.e. the task of automatically aligning data records with textual descriptions, such that data tokens are aligned with the word strings that describe them. Our methods make use of log likelihood ratios to estimate the strength of association between data tokens and text tokens. We investigate datatext alignment at the document level and at the sentence level, reporting results for several methodological variants as well as baselines. We find that log likelihood ratios provide a strong basis for predicting data-text alignment.

Original language	English
Title of host publication	Proceedings of the 4th Workshop on Building and Using Comparable Corpora, 49th Annual Meeting of the Association for Computational Linguistics
Place of Publication	Stroudsburg, PA, USA
Publisher	Association for Computational Linguistics
Pages	102-109
Number of pages	8
Publication status	Published - 1 May 2011
Event	Proceedings of the 4th Workshop on Building and Using Comparable Corpora, 49th Annual Meeting of the Association for Computational Linguistics - Portland, Oregon, USA Duration: 1 May 2011 → …

Workshop

Workshop	Proceedings of the 4th Workshop on Building and Using Comparable Corpora, 49th Annual Meeting of the Association for Computational Linguistics
Period	1/05/11 → …

Access to Document

http://www.mt-archive.info/BUCC-2011-Belz.pdfLicence: Unspecified

Cite this

@inproceedings{a192af53353e4a90be26460812198d7d,

title = "Unsupervised alignment of comparable data and text resources",

abstract = "In this paper we investigate automatic datatext alignment, i.e. the task of automatically aligning data records with textual descriptions, such that data tokens are aligned with the word strings that describe them. Our methods make use of log likelihood ratios to estimate the strength of association between data tokens and text tokens. We investigate datatext alignment at the document level and at the sentence level, reporting results for several methodological variants as well as baselines. We find that log likelihood ratios provide a strong basis for predicting data-text alignment.",

author = "Anja Belz and Eric Kow",

year = "2011",

month = may,

day = "1",

language = "English",

pages = "102--109",

booktitle = "Proceedings of the 4th Workshop on Building and Using Comparable Corpora, 49th Annual Meeting of the Association for Computational Linguistics",

publisher = "Association for Computational Linguistics",

note = "Proceedings of the 4th Workshop on Building and Using Comparable Corpora, 49th Annual Meeting of the Association for Computational Linguistics ; Conference date: 01-05-2011",

}

Belz, A & Kow, E 2011, Unsupervised alignment of comparable data and text resources. in Proceedings of the 4th Workshop on Building and Using Comparable Corpora, 49th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 102-109, Proceedings of the 4th Workshop on Building and Using Comparable Corpora, 49th Annual Meeting of the Association for Computational Linguistics, 1/05/11. <http://www.mt-archive.info/BUCC-2011-Belz.pdf>

Unsupervised alignment of comparable data and text resources. / Belz, Anja; Kow, Eric.
Proceedings of the 4th Workshop on Building and Using Comparable Corpora, 49th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2011. p. 102-109.

Research output: Chapter in Book/Conference proceeding with ISSN or ISBN › Conference contribution with ISSN or ISBN › peer-review

TY - GEN

T1 - Unsupervised alignment of comparable data and text resources

AU - Belz, Anja

AU - Kow, Eric

PY - 2011/5/1

Y1 - 2011/5/1

N2 - In this paper we investigate automatic datatext alignment, i.e. the task of automatically aligning data records with textual descriptions, such that data tokens are aligned with the word strings that describe them. Our methods make use of log likelihood ratios to estimate the strength of association between data tokens and text tokens. We investigate datatext alignment at the document level and at the sentence level, reporting results for several methodological variants as well as baselines. We find that log likelihood ratios provide a strong basis for predicting data-text alignment.

AB - In this paper we investigate automatic datatext alignment, i.e. the task of automatically aligning data records with textual descriptions, such that data tokens are aligned with the word strings that describe them. Our methods make use of log likelihood ratios to estimate the strength of association between data tokens and text tokens. We investigate datatext alignment at the document level and at the sentence level, reporting results for several methodological variants as well as baselines. We find that log likelihood ratios provide a strong basis for predicting data-text alignment.

M3 - Conference contribution with ISSN or ISBN

SP - 102

EP - 109

BT - Proceedings of the 4th Workshop on Building and Using Comparable Corpora, 49th Annual Meeting of the Association for Computational Linguistics

PB - Association for Computational Linguistics

CY - Stroudsburg, PA, USA

T2 - Proceedings of the 4th Workshop on Building and Using Comparable Corpora, 49th Annual Meeting of the Association for Computational Linguistics

Y2 - 1 May 2011

ER -