Klein, Dan and Manning, Christopher D. (2002) A Generative Constituent-Context Model for Improved Grammar Induction. Technical Report. Stanford.
BibTeX | DublinCore | EndNote | HTML |
| PDF 107Kb |
Abstract
We present a generative distributional model for the unsupervised induction of natural language syntax, which explicitly models constituent yields and contexts. Parameter search with EM produces higher quality analyses than previously exhibited by unsupervised systems, giving the best published unsupervised parsing results on the ATIS corpus. Experiments on Penn treebank sentences of comparable length show an even higher F1 of 71% on non-trivial brackets. We examine the addition of weak supervision, and compare distributionally induced and actual parts-of-speech tags as input data. We discuss errors made by the system, compare the system to previous models, and discuss upper bounds, lower bounds, and stability for this task.
Item Type: | Techreport (Technical Report) | |
---|---|---|
Uncontrolled Keywords: | nlp, parsing, unsupervised learning, grammar induction, language learning | |
Subjects: | Computer Science Miscellaneous | |
Projects: | Miscellaneous | |
Related URLs: | Project Homepage | http://www-nlp.stanford.edu/ |
ID Code: | 534 | |
Deposited By: | Import Account | |
Deposited On: | 03 Mar 2002 16:00 | |
Last Modified: | 25 Dec 2008 09:41 |
Download statistics
Repository Staff Only: item control page