Smarr, Joseph and Manning, Christopher D. (2002) Classifying Unknown Proper Noun Phrases Without Context. Technical Report. Stanford.
BibTeX | DublinCore | EndNote | HTML |
| PDF 288Kb |
Abstract
We present a probabilistic generative model used to classify unknown Proper Noun Phrases into semantic categories. The core of the classifier is an n-gram character model, which is enhanced with an n-gram word-length model and a common word model. While most work has depended largely on context or domain-specific rules for semantic disambiguation of unknown names, we demonstrate that there is surprisingly reliable statistical information available in the composition of the names themselves. Using the context-independent probabilities assigned by our domain independent classifier is sufficient to achieve greater than 90% classification accuracy on typical tasks.
Item Type: | Techreport (Technical Report) | |
---|---|---|
Uncontrolled Keywords: | named-entity classification, unknown words, probabilistic modeling, n-grams | |
Subjects: | Miscellaneous | |
Projects: | Miscellaneous | |
Related URLs: | Project Homepage | http://www-nlp.stanford.edu/ |
ID Code: | 554 | |
Deposited By: | Import Account | |
Deposited On: | 29 Sep 2002 17:00 | |
Last Modified: | 25 Dec 2008 10:08 |
Download statistics
Repository Staff Only: item control page