Toutanova, Kristina and Manning, Christopher (2000) Enriching the Knowledge Sources used in a Maximum Entropy Part-of-Speech tagger. In: Joint SIGDAT Conference on Empirical Methods in NLP and Very Large Corpora (EMNLP/VLC-2000) , October 7-8, 2000 , Hong Kong.
BibTeX | DublinCore | EndNote | HTML |
![]()
| PDF 53Kb |
Abstract
This paper presents results for a maximum-entropy-based part of speech tagger, which achieves superior performance principally by enriching the information sources used for tagging. In particular, we get improved results by incorporating these features: (i) more extensive treatment of capitalization for unknown words; (ii) features for the disambiguation of the tense forms of verbs; (iii) features for disambiguating particles from prepositions and adverbs. The best resulting accuracy for the tagger on the Penn Treebank is 96.86% overall, and 86.91% on previously unseen words.
Item Type: | Conference or Workshop Item (Paper) | |
---|---|---|
Uncontrolled Keywords: | part-of-speech tagging, maximum entropy, disambiguation | |
Subjects: | Computer Science | |
Projects: | Miscellaneous | |
Related URLs: | Project Homepage | http://www-nlp.stanford.edu/ |
ID Code: | 459 | |
Deposited By: | Import Account | |
Deposited On: | 16 Oct 2001 17:00 | |
Last Modified: | 27 Dec 2008 15:45 |
Download statistics
Repository Staff Only: item control page