Stanford InfoLab Publication Server

Enriching the Knowledge Sources used in a Maximum Entropy Part-of-Speech tagger

Toutanova, Kristina and Manning, Christopher (2000) Enriching the Knowledge Sources used in a Maximum Entropy Part-of-Speech tagger. In: Joint SIGDAT Conference on Empirical Methods in NLP and Very Large Corpora (EMNLP/VLC-2000) , October 7-8, 2000 , Hong Kong.




This paper presents results for a maximum-entropy-based part of speech tagger, which achieves superior performance principally by enriching the information sources used for tagging. In particular, we get improved results by incorporating these features: (i) more extensive treatment of capitalization for unknown words; (ii) features for the disambiguation of the tense forms of verbs; (iii) features for disambiguating particles from prepositions and adverbs. The best resulting accuracy for the tagger on the Penn Treebank is 96.86% overall, and 86.91% on previously unseen words.

Item Type:Conference or Workshop Item (Paper)
Uncontrolled Keywords:part-of-speech tagging, maximum entropy, disambiguation
Subjects:Computer Science
Related URLs:Project Homepage
ID Code:459
Deposited By:Import Account
Deposited On:16 Oct 2001 17:00
Last Modified:27 Dec 2008 15:45

Download statistics

Repository Staff Only: item control page