A game-based approach to transcribing images of text

Khalil Dahab; Anja Belz

A game-based approach to transcribing images of text

Khalil Dahab, Anja Belz

University of Brighton

Research output: Chapter in Book/Conference proceeding with ISSN or ISBN › Conference contribution with ISSN or ISBN › peer-review

Abstract

We present a methodology that takes as input scanned documents of typed or hand-written text, and produces transcriptions of the text as output. Instead of using OCR technology, the methodology is game-based and produces such transcriptions as a by-product. The approach is intended particularly for languages for which language technology and resources are scarce and reliable OCR technology may not exist. It can be used in place of OCR for transcribing individual documents, or to create corpora of paired images and transcriptions required to train OCR tools. We present Minefield, a prototype implementation of the approach which is currently collecting Arabic transcriptions.

Original language	English
Title of host publication	7th International Conference on Language Resources and Evaluation
Place of Publication	France
Publisher	European Language Resources Association (ELRA)
Pages	0-0
Number of pages	1
Publication status	Published - 1 Jan 2010
Event	7th International Conference on Language Resources and Evaluation - Valletta, Malta Duration: 1 Jan 2010 → …

Conference

Conference	7th International Conference on Language Resources and Evaluation
Period	1/01/10 → …

Access to Document

http://www.itri.brighton.ac.uk/~Anja.Belz/Publications/abudahab-belz-final.pdfLicence: Unspecified

Cite this

@inproceedings{ebc11634d1ef43cba7b7e54b6658bf1a,

title = "A game-based approach to transcribing images of text",

abstract = "We present a methodology that takes as input scanned documents of typed or hand-written text, and produces transcriptions of the text as output. Instead of using OCR technology, the methodology is game-based and produces such transcriptions as a by-product. The approach is intended particularly for languages for which language technology and resources are scarce and reliable OCR technology may not exist. It can be used in place of OCR for transcribing individual documents, or to create corpora of paired images and transcriptions required to train OCR tools. We present Minefield, a prototype implementation of the approach which is currently collecting Arabic transcriptions.",

author = "Khalil Dahab and Anja Belz",

year = "2010",

month = jan,

day = "1",

language = "English",

pages = "0--0",

booktitle = "7th International Conference on Language Resources and Evaluation",

publisher = "European Language Resources Association (ELRA)",

note = "7th International Conference on Language Resources and Evaluation ; Conference date: 01-01-2010",

}

Dahab, K & Belz, A 2010, A game-based approach to transcribing images of text. in 7th International Conference on Language Resources and Evaluation. European Language Resources Association (ELRA), France, pp. 0-0, 7th International Conference on Language Resources and Evaluation, 1/01/10. <http://www.itri.brighton.ac.uk/~Anja.Belz/Publications/abudahab-belz-final.pdf>

TY - GEN

T1 - A game-based approach to transcribing images of text

AU - Dahab, Khalil

AU - Belz, Anja

PY - 2010/1/1

Y1 - 2010/1/1

N2 - We present a methodology that takes as input scanned documents of typed or hand-written text, and produces transcriptions of the text as output. Instead of using OCR technology, the methodology is game-based and produces such transcriptions as a by-product. The approach is intended particularly for languages for which language technology and resources are scarce and reliable OCR technology may not exist. It can be used in place of OCR for transcribing individual documents, or to create corpora of paired images and transcriptions required to train OCR tools. We present Minefield, a prototype implementation of the approach which is currently collecting Arabic transcriptions.

AB - We present a methodology that takes as input scanned documents of typed or hand-written text, and produces transcriptions of the text as output. Instead of using OCR technology, the methodology is game-based and produces such transcriptions as a by-product. The approach is intended particularly for languages for which language technology and resources are scarce and reliable OCR technology may not exist. It can be used in place of OCR for transcribing individual documents, or to create corpora of paired images and transcriptions required to train OCR tools. We present Minefield, a prototype implementation of the approach which is currently collecting Arabic transcriptions.

M3 - Conference contribution with ISSN or ISBN

SP - 0

EP - 0

BT - 7th International Conference on Language Resources and Evaluation

PB - European Language Resources Association (ELRA)

CY - France

T2 - 7th International Conference on Language Resources and Evaluation

Y2 - 1 January 2010

ER -

A game-based approach to transcribing images of text

Abstract

Conference

Access to Document

Fingerprint

Cite this