From machine readable dictionaries to lexical databases: the CONCEDE experience

T. Erjavec, Roger Evans, N. Ide, A. Kilgarriff

Research output: Chapter in Book/Conference proceeding with ISSN or ISBNConference contribution with ISSN or ISBNpeer-review

Abstract

It is commonly held that machine-readable dictionaries play a key role in bootstrapping effective wide-coverage language-technology, especially in less well-resourced languages. However, while the linguistic knowledge they contain is clearly necessary for this goal, it is far from clear that the format it is presented in is sufficient to reach it. A crucial step in the deployment of such resources is to map them into lexical databases with standardised and well-understood structure and semantics. Furthermore, considerable additional benefits are obtained if such structure and semantics are shared with other linguistic resources. Achieving such a goal, however, is often not an easy task. This paper describes how such a mapping was carried out in the CONCEDE project, for six Central and Eastern European Languages (Bulgarian, Czech, Estonian, Hungarian, Romanian, and Slovene) for which few wide-coverage lexical resources had previously been available. In a two-stage process, the machine-readable data for each language was first mapped into broadly compatible, TEI-compliant SGML representations, and then these representations were harmonised into a single XML scheme. The resulting framework offers a concise, flexible lexical database specification, with a demonstrable ability to cope with a diverse range of dictionary and language requirements, and lexical resources suitable for monolingual and multilingual application.
Original languageEnglish
Title of host publicationCOMPLEX 2003, 7th Conference on Computational Lexicography and Text Research
Pages1-9
Number of pages9
Publication statusPublished - 2003
EventCOMPLEX 2003, 7th Conference on Computational Lexicography and Text Research - Budapest, Hungary
Duration: 1 Jan 2003 → …

Conference

ConferenceCOMPLEX 2003, 7th Conference on Computational Lexicography and Text Research
Period1/01/03 → …

Fingerprint

Dive into the research topics of 'From machine readable dictionaries to lexical databases: the CONCEDE experience'. Together they form a unique fingerprint.

Cite this