Haveliwala, Taher H. and Gionis, Aristides and Klein, Dan and Indyk, Piotr (2002) Evaluating Strategies for Similarity Search on the Web. In: Eleventh International World Wide Web Conference (WWW 2002), May 7-11, 2002, Honolulu, Hawaii.
This is the latest version of this item.
Finding pages on the Web that are similar to a query page (Related Pages) is an important component of modern search engines. A variety of strategies have been proposed for answering Related Pages queries, but comparative evaluation by user studies is expensive, especially when large strategy spaces must be searched (e.g., when tuning parameters). We present a technique for automatically evaluating strategies using Web hierarchies, such as Open Directory, in place of user feedback. We apply this evaluation methodology to a mix of document representation strategies, including the use of text, anchor-text, and links. We discuss the relative advantages and disadvantages of the various approaches examined. Finally, we describe how to efficiently construct a similarity index out of our chosen strategies, and provide sample results from our index.
|Item Type:||Conference or Workshop Item (Paper)|
|Uncontrolled Keywords:||related pages, similarity search, search, evaluation, Open Directory Project|
|Subjects:||Computer Science > Data Mining|
Computer Science > Databases and the Web
|Related URLs:||Project Homepage, Project Homepage, Project Homepage||http://infolab.stanford.edu/, http://infolab.stanford.edu/, http://www-nlp.stanford.edu/|
|Deposited By:||Import Account|
|Deposited On:||13 Feb 2002 16:00|
|Last Modified:||25 Dec 2008 09:32|
Available Versions of this Item
- Similarity Search on the Web: Evaluation and Scalability Considerations. (deposited 25 Feb 2001 16:00)
- Evaluating Strategies for Similarity Search on the Web. (deposited 13 Feb 2002 16:00) [Currently Displayed]
Repository Staff Only: item control page