Cho, J. and Garcia-Molina, H. (1999) The Evolution of the Web and Implications for an Incremental Crawler. Technical Report. Stanford.
In this paper we study how to build an effective incremental crawler. The crawler selectively and incrementally updates its index and/or local collection of web pages, instead of periodically refreshing the collection in batch mode. The incremental crawler can improve the ``freshness'' of the collection significantly and bring in new pages in a more timely manner. We first present results from an experiment conducted on more than half million web pages over 4 months, to estimate how web pages evolve over time. Based on these experimental results, we compare various design choices for an incremental crawler and discuss their trade-offs. We propose an architecture for the incremental crawler, which combines the best design choices.
|Item Type:||Techreport (Technical Report)|
|Uncontrolled Keywords:||web evolution, incremental crawler, web change model|
|Subjects:||Computer Science > Databases and the Web|
|Related URLs:||Project Homepage||http://www-diglib.stanford.edu/diglib/pub/|
|Deposited By:||Import Account|
|Deposited On:||25 Feb 2000 16:00|
|Last Modified:||27 Dec 2008 21:04|
Repository Staff Only: item control page