Enriching the Knowledge Sources used in a Maximum Entropy Part-of-Speech tagger

Toutanova, Kristina and Manning, Christopher (2000) Enriching the Knowledge Sources used in a Maximum Entropy Part-of-Speech tagger. In: Joint SIGDAT Conference on Empirical Methods in NLP and Very Large Corpora (EMNLP/VLC-2000) , October 7-8, 2000 , Hong Kong.

Preview

PDF
53Kb

Abstract

This paper presents results for a maximum-entropy-based part of speech tagger, which achieves superior performance principally by enriching the information sources used for tagging. In particular, we get improved results by incorporating these features: (i) more extensive treatment of capitalization for unknown words; (ii) features for the disambiguation of the tense forms of verbs; (iii) features for disambiguating particles from prepositions and adverbs. The best resulting accuracy for the tagger on the Penn Treebank is 96.86% overall, and 86.91% on previously unseen words.

Item Type:	Conference or Workshop Item (Paper)
Uncontrolled Keywords:	part-of-speech tagging, maximum entropy, disambiguation
Subjects:	Computer Science
Projects:	Miscellaneous
Related URLs:	Project Homepage	http://www-nlp.stanford.edu/
ID Code:	459
Deposited By:	Import Account
Deposited On:	16 Oct 2001 17:00
Last Modified:	27 Dec 2008 15:45

Download statistics

Repository Staff Only: item control page