Optimisation of corpus-derived probabilistic grammars

Anja Belz

Research output: Chapter in Book/Conference proceeding with ISSN or ISBNConference contribution with ISSN or ISBNpeer-review

Abstract

This paper examines the usefulness of corpus-derived probabilistic grammars as a basis for the automatic construction of grammars optimised for a given parsing task. Initially, a probabilistic context-free grammar (PCFG) is derived by a straightforward derivation technique from the Wall Street Journal (WSJ) Corpus, and a baseline is established by testing the resulting grammar on four different parsing tasks. In the first optimisation step, different kinds of local structural context (LSC) are incorporated into the basic PCFG. Improved parsing results demonstrate the usefulness of the added structural context information. In the second optimisation step, LSC-PCFGs are optimised in terms of grammar size and performance for a given parsing task. Tests show that significant improvements can be achieved by the method proposed. The structure of this paper is as follows. Section 2 discusses the practica
Original languageEnglish
Title of host publicationCorpus Linguistics 2001
Place of PublicationLancaster, UK
Pages46-57
Number of pages12
DOIs
Publication statusPublished - 1 Mar 2001
EventCorpus Linguistics 2001 - Lancaster University, UK
Duration: 1 Mar 2001 → …

Conference

ConferenceCorpus Linguistics 2001
Period1/03/01 → …

Fingerprint

Dive into the research topics of 'Optimisation of corpus-derived probabilistic grammars'. Together they form a unique fingerprint.

Cite this