Gyongyi, Zoltan and Garcia-Molina, Hector and Pedersen, Jan (2006) Web Content Categorization Using Link Information. Technical Report. Stanford.
Document categorization is one of the foundational problems in (web) information retrieval. Even though web documents are hyperlinked, most proposed classification techniques take little advantage of the link structure and rely primarily on text features, as it is not immediately clear how to make link information intelligible to supervised machine learning algorithms. This paper introduces a link-based approach to classification, which can be used in isolation or in conjunction with text-based classification. Various large-scale experimental results indicate that link-based classification is on par with text-based classification, and the combination of the two offers the best of both worlds.
|Item Type:||Techreport (Technical Report)|
|Uncontrolled Keywords:||web search, hypertext categorization, web link structure analysis|
|Subjects:||Computer Science > Data Mining|
Computer Science > Databases and the Web
|Related URLs:||Project Homepage||http://infolab.stanford.edu/|
|Deposited By:||Import Account|
|Deposited On:||19 Jul 2006 17:00|
|Last Modified:||18 Dec 2008 14:46|
Repository Staff Only: item control page