Stanford InfoLab Publication Server

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

Klein, Dan and Manning, Christopher D. (2001) Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank. In: 39th Annual Meeting on Association for Computational Linguistics (ACL 2001) , July 06 - 11, 2001, Toulouse, France.

BibTeXDublinCoreEndNoteHTML

[img]
Preview
PDF
26Kb

Abstract

This paper presents empirical studies and closely corresponding theoretical models of the performance of a chart parser exhaustively parsing the Penn Treebank with the Treebank's own CFG grammar. We show how performance is dramatically affected by rule representation and tree transformations, but little by top-down vs. bottom-up strategies. We discuss grammatical saturation, including analysis of the strongly connected components of the phrasal nonterminals in the Treebank, and model how, as sentence length increases, the effective grammar rule size increases as regions of the grammar are unlocked, yielding super-cubic observed time behavior in some configurations.

Item Type:Conference or Workshop Item (Paper)
Uncontrolled Keywords:parsing, nlp, PCFG, empirical performance
Subjects:Computer Science
Projects:Miscellaneous
Related URLs:Project Homepagehttp://www-nlp.stanford.edu/
ID Code:506
Deposited By:Import Account
Deposited On:08 Oct 2001 17:00
Last Modified:27 Dec 2008 10:18

Download statistics

Repository Staff Only: item control page