Optimisation of corpus-derived probabilistic grammars

Anja Belz

doi:10.1.1.7.4428

Optimisation of corpus-derived probabilistic grammars

Anja Belz

University of Brighton

Research output: Chapter in Book/Conference proceeding with ISSN or ISBN › Conference contribution with ISSN or ISBN › peer-review

Abstract

This paper examines the usefulness of corpus-derived probabilistic grammars as a basis for the automatic construction of grammars optimised for a given parsing task. Initially, a probabilistic context-free grammar (PCFG) is derived by a straightforward derivation technique from the Wall Street Journal (WSJ) Corpus, and a baseline is established by testing the resulting grammar on four different parsing tasks. In the first optimisation step, different kinds of local structural context (LSC) are incorporated into the basic PCFG. Improved parsing results demonstrate the usefulness of the added structural context information. In the second optimisation step, LSC-PCFGs are optimised in terms of grammar size and performance for a given parsing task. Tests show that significant improvements can be achieved by the method proposed. The structure of this paper is as follows. Section 2 discusses the practica

Original language	English
Title of host publication	Corpus Linguistics 2001
Place of Publication	Lancaster, UK
Pages	46-57
Number of pages	12
DOIs	https://doi.org/10.1.1.7.4428
Publication status	Published - 1 Mar 2001
Event	Corpus Linguistics 2001 - Lancaster University, UK Duration: 1 Mar 2001 → …

Conference

Conference	Corpus Linguistics 2001
Period	1/03/01 → …

Access to Document

10.1.1.7.4428Licence: Unspecified

corpus-linguistics-paper.pdfOther version, 218 KBLicence: Unspecified

Cite this

@inproceedings{874eeeabbf59449788db37b4fe7839f1,

title = "Optimisation of corpus-derived probabilistic grammars",

abstract = "This paper examines the usefulness of corpus-derived probabilistic grammars as a basis for the automatic construction of grammars optimised for a given parsing task. Initially, a probabilistic context-free grammar (PCFG) is derived by a straightforward derivation technique from the Wall Street Journal (WSJ) Corpus, and a baseline is established by testing the resulting grammar on four different parsing tasks. In the first optimisation step, different kinds of local structural context (LSC) are incorporated into the basic PCFG. Improved parsing results demonstrate the usefulness of the added structural context information. In the second optimisation step, LSC-PCFGs are optimised in terms of grammar size and performance for a given parsing task. Tests show that significant improvements can be achieved by the method proposed. The structure of this paper is as follows. Section 2 discusses the practica",

author = "Anja Belz",

year = "2001",

month = mar,

day = "1",

doi = "10.1.1.7.4428",

language = "English",

pages = "46--57",

booktitle = "Corpus Linguistics 2001",

note = "Corpus Linguistics 2001 ; Conference date: 01-03-2001",

}

TY - GEN

T1 - Optimisation of corpus-derived probabilistic grammars

AU - Belz, Anja

PY - 2001/3/1

Y1 - 2001/3/1

N2 - This paper examines the usefulness of corpus-derived probabilistic grammars as a basis for the automatic construction of grammars optimised for a given parsing task. Initially, a probabilistic context-free grammar (PCFG) is derived by a straightforward derivation technique from the Wall Street Journal (WSJ) Corpus, and a baseline is established by testing the resulting grammar on four different parsing tasks. In the first optimisation step, different kinds of local structural context (LSC) are incorporated into the basic PCFG. Improved parsing results demonstrate the usefulness of the added structural context information. In the second optimisation step, LSC-PCFGs are optimised in terms of grammar size and performance for a given parsing task. Tests show that significant improvements can be achieved by the method proposed. The structure of this paper is as follows. Section 2 discusses the practica

AB - This paper examines the usefulness of corpus-derived probabilistic grammars as a basis for the automatic construction of grammars optimised for a given parsing task. Initially, a probabilistic context-free grammar (PCFG) is derived by a straightforward derivation technique from the Wall Street Journal (WSJ) Corpus, and a baseline is established by testing the resulting grammar on four different parsing tasks. In the first optimisation step, different kinds of local structural context (LSC) are incorporated into the basic PCFG. Improved parsing results demonstrate the usefulness of the added structural context information. In the second optimisation step, LSC-PCFGs are optimised in terms of grammar size and performance for a given parsing task. Tests show that significant improvements can be achieved by the method proposed. The structure of this paper is as follows. Section 2 discusses the practica

U2 - 10.1.1.7.4428

DO - 10.1.1.7.4428

M3 - Conference contribution with ISSN or ISBN

SP - 46

EP - 57

BT - Corpus Linguistics 2001

CY - Lancaster, UK

T2 - Corpus Linguistics 2001

Y2 - 1 March 2001

ER -

Optimisation of corpus-derived probabilistic grammars

Abstract

Conference

Access to Document

Fingerprint

Cite this