Toutanova, Kristina and Manning, Christopher (2000) Enriching the Knowledge Sources used in a Maximum Entropy Part-of-Speech tagger. In: Joint SIGDAT Conference on Empirical Methods in NLP and Very Large Corpora (EMNLP/VLC-2000) , October 7-8, 2000 , Hong Kong.
This paper presents results for a maximum-entropy-based part of speech tagger, which achieves superior performance principally by enriching the information sources used for tagging. In particular, we get improved results by incorporating these features: (i) more extensive treatment of capitalization for unknown words; (ii) features for the disambiguation of the tense forms of verbs; (iii) features for disambiguating particles from prepositions and adverbs. The best resulting accuracy for the tagger on the Penn Treebank is 96.86% overall, and 86.91% on previously unseen words.
|Item Type:||Conference or Workshop Item (Paper)|
|Uncontrolled Keywords:||part-of-speech tagging, maximum entropy, disambiguation|
|Related URLs:||Project Homepage||http://www-nlp.stanford.edu/|
|Deposited By:||Import Account|
|Deposited On:||16 Oct 2001 17:00|
|Last Modified:||27 Dec 2008 15:45|
Repository Staff Only: item control page