From machine readable dictionaries to lexical databases: the CONCEDE experience

T. Erjavec, Roger Evans, N. Ide, A. Kilgarriff

Research output: Chapter in Book/Conference proceeding with ISSN or ISBNConference contribution with ISSN or ISBNResearchpeer-review

Abstract

It is commonly held that machine-readable dictionaries play a key role in bootstrapping effective wide-coverage language-technology, especially in less well-resourced languages. However, while the linguistic knowledge they contain is clearly necessary for this goal, it is far from clear that the format it is presented in is sufficient to reach it. A crucial step in the deployment of such resources is to map them into lexical databases with standardised and well-understood structure and semantics. Furthermore, considerable additional benefits are obtained if such structure and semantics are shared with other linguistic resources. Achieving such a goal, however, is often not an easy task. This paper describes how such a mapping was carried out in the CONCEDE project, for six Central and Eastern European Languages (Bulgarian, Czech, Estonian, Hungarian, Romanian, and Slovene) for which few wide-coverage lexical resources had previously been available. In a two-stage process, the machine-readable data for each language was first mapped into broadly compatible, TEI-compliant SGML representations, and then these representations were harmonised into a single XML scheme. The resulting framework offers a concise, flexible lexical database specification, with a demonstrable ability to cope with a diverse range of dictionary and language requirements, and lexical resources suitable for monolingual and multilingual application.
Original languageEnglish
Title of host publicationCOMPLEX 2003, 7th Conference on Computational Lexicography and Text Research
Pages1-9
Number of pages9
Publication statusPublished - 2003
EventCOMPLEX 2003, 7th Conference on Computational Lexicography and Text Research - Budapest, Hungary
Duration: 1 Jan 2003 → …

Conference

ConferenceCOMPLEX 2003, 7th Conference on Computational Lexicography and Text Research
Period1/01/03 → …

Fingerprint

Glossaries
Linguistics
SGML
Semantics
XML
Specifications

Cite this

Erjavec, T., Evans, R., Ide, N., & Kilgarriff, A. (2003). From machine readable dictionaries to lexical databases: the CONCEDE experience. In COMPLEX 2003, 7th Conference on Computational Lexicography and Text Research (pp. 1-9)
Erjavec, T. ; Evans, Roger ; Ide, N. ; Kilgarriff, A. / From machine readable dictionaries to lexical databases: the CONCEDE experience. COMPLEX 2003, 7th Conference on Computational Lexicography and Text Research. 2003. pp. 1-9
@inproceedings{e911f04dd0a44cd6ba5e476c6ea10020,
title = "From machine readable dictionaries to lexical databases: the CONCEDE experience",
abstract = "It is commonly held that machine-readable dictionaries play a key role in bootstrapping effective wide-coverage language-technology, especially in less well-resourced languages. However, while the linguistic knowledge they contain is clearly necessary for this goal, it is far from clear that the format it is presented in is sufficient to reach it. A crucial step in the deployment of such resources is to map them into lexical databases with standardised and well-understood structure and semantics. Furthermore, considerable additional benefits are obtained if such structure and semantics are shared with other linguistic resources. Achieving such a goal, however, is often not an easy task. This paper describes how such a mapping was carried out in the CONCEDE project, for six Central and Eastern European Languages (Bulgarian, Czech, Estonian, Hungarian, Romanian, and Slovene) for which few wide-coverage lexical resources had previously been available. In a two-stage process, the machine-readable data for each language was first mapped into broadly compatible, TEI-compliant SGML representations, and then these representations were harmonised into a single XML scheme. The resulting framework offers a concise, flexible lexical database specification, with a demonstrable ability to cope with a diverse range of dictionary and language requirements, and lexical resources suitable for monolingual and multilingual application.",
author = "T. Erjavec and Roger Evans and N. Ide and A. Kilgarriff",
year = "2003",
language = "English",
pages = "1--9",
booktitle = "COMPLEX 2003, 7th Conference on Computational Lexicography and Text Research",

}

Erjavec, T, Evans, R, Ide, N & Kilgarriff, A 2003, From machine readable dictionaries to lexical databases: the CONCEDE experience. in COMPLEX 2003, 7th Conference on Computational Lexicography and Text Research. pp. 1-9, COMPLEX 2003, 7th Conference on Computational Lexicography and Text Research, 1/01/03.

From machine readable dictionaries to lexical databases: the CONCEDE experience. / Erjavec, T.; Evans, Roger; Ide, N.; Kilgarriff, A.

COMPLEX 2003, 7th Conference on Computational Lexicography and Text Research. 2003. p. 1-9.

Research output: Chapter in Book/Conference proceeding with ISSN or ISBNConference contribution with ISSN or ISBNResearchpeer-review

TY - GEN

T1 - From machine readable dictionaries to lexical databases: the CONCEDE experience

AU - Erjavec, T.

AU - Evans, Roger

AU - Ide, N.

AU - Kilgarriff, A.

PY - 2003

Y1 - 2003

N2 - It is commonly held that machine-readable dictionaries play a key role in bootstrapping effective wide-coverage language-technology, especially in less well-resourced languages. However, while the linguistic knowledge they contain is clearly necessary for this goal, it is far from clear that the format it is presented in is sufficient to reach it. A crucial step in the deployment of such resources is to map them into lexical databases with standardised and well-understood structure and semantics. Furthermore, considerable additional benefits are obtained if such structure and semantics are shared with other linguistic resources. Achieving such a goal, however, is often not an easy task. This paper describes how such a mapping was carried out in the CONCEDE project, for six Central and Eastern European Languages (Bulgarian, Czech, Estonian, Hungarian, Romanian, and Slovene) for which few wide-coverage lexical resources had previously been available. In a two-stage process, the machine-readable data for each language was first mapped into broadly compatible, TEI-compliant SGML representations, and then these representations were harmonised into a single XML scheme. The resulting framework offers a concise, flexible lexical database specification, with a demonstrable ability to cope with a diverse range of dictionary and language requirements, and lexical resources suitable for monolingual and multilingual application.

AB - It is commonly held that machine-readable dictionaries play a key role in bootstrapping effective wide-coverage language-technology, especially in less well-resourced languages. However, while the linguistic knowledge they contain is clearly necessary for this goal, it is far from clear that the format it is presented in is sufficient to reach it. A crucial step in the deployment of such resources is to map them into lexical databases with standardised and well-understood structure and semantics. Furthermore, considerable additional benefits are obtained if such structure and semantics are shared with other linguistic resources. Achieving such a goal, however, is often not an easy task. This paper describes how such a mapping was carried out in the CONCEDE project, for six Central and Eastern European Languages (Bulgarian, Czech, Estonian, Hungarian, Romanian, and Slovene) for which few wide-coverage lexical resources had previously been available. In a two-stage process, the machine-readable data for each language was first mapped into broadly compatible, TEI-compliant SGML representations, and then these representations were harmonised into a single XML scheme. The resulting framework offers a concise, flexible lexical database specification, with a demonstrable ability to cope with a diverse range of dictionary and language requirements, and lexical resources suitable for monolingual and multilingual application.

M3 - Conference contribution with ISSN or ISBN

SP - 1

EP - 9

BT - COMPLEX 2003, 7th Conference on Computational Lexicography and Text Research

ER -

Erjavec T, Evans R, Ide N, Kilgarriff A. From machine readable dictionaries to lexical databases: the CONCEDE experience. In COMPLEX 2003, 7th Conference on Computational Lexicography and Text Research. 2003. p. 1-9