Abstract
We present a methodology that takes as input scanned documents of typed or hand-written text, and produces transcriptions of the text as output. Instead of using OCR technology, the methodology is game-based and produces such transcriptions as a by-product. The approach is intended particularly for languages for which language technology and resources are scarce and reliable OCR technology may not exist. It can be used in place of OCR for transcribing individual documents, or to create corpora of paired images and transcriptions required to train OCR tools. We present Minefield, a prototype implementation of the approach which is currently collecting Arabic transcriptions.
Original language | English |
---|---|
Title of host publication | 7th International Conference on Language Resources and Evaluation |
Place of Publication | France |
Publisher | European Language Resources Association (ELRA) |
Pages | 0-0 |
Number of pages | 1 |
Publication status | Published - 1 Jan 2010 |
Event | 7th International Conference on Language Resources and Evaluation - Valletta, Malta Duration: 1 Jan 2010 → … |
Conference
Conference | 7th International Conference on Language Resources and Evaluation |
---|---|
Period | 1/01/10 → … |