Haveliwala, Taher H. (2002) Efficient Encodings for Document Ranking Vectors. Technical Report. Stanford.
The rapid growth of the Web has led to the development of many techniques for enhancing search rankings by using precomputed numeric document attributes such as the estimated popularity or importance of Web pages. For efficient keyword-search query processing over large document repositories, it is vital that these auxiliary attribute vectors, containing numeric per-document properties, be kept in main memory. When only a small number of attribute vectors are used by the system (e.g., a document-length vector for implementing the cosine ranking scheme), a standard 4-byte, single-precision floating point representation for the numeric values suffices. However, for richer search rankings, which incorporate additional numeric attributes (e.g., a set of page-importance estimates for each page), it becomes more difficult to maintain all of the auxiliary ranking vectors in main memory. We propose lossy encoding schemes based on scalar quantization that efficiently encode auxiliary numeric properties, such as PageRank, an estimate of page importance used by the Google search engine. Unlike standard scalar quantization algorithms, which concentrate on minimizing the numerical distortion caused by lossy encodings, we seek to minimize the distortion of search-result rankings.
|Item Type:||Techreport (Technical Report)|
|Uncontrolled Keywords:||compression, PageRank, web search, ranking|
|Subjects:||Computer Science > Databases and the Web|
|Related URLs:||Project Homepage, Project Homepage||http://infolab.stanford.edu/, http://infolab.stanford.edu/|
|Deposited By:||Import Account|
|Deposited On:||01 Dec 2002 16:00|
|Last Modified:||25 Dec 2008 09:27|
Repository Staff Only: item control page