Cho, J. and Garcia-Molina, H. (1999) The Evolution of the Web and Implications for an Incremental Crawler. Technical Report. Stanford.
BibTeX | DublinCore | EndNote | HTML |
![]()
| PDF 278Kb |
Abstract
In this paper we study how to build an effective incremental crawler. The crawler selectively and incrementally updates its index and/or local collection of web pages, instead of periodically refreshing the collection in batch mode. The incremental crawler can improve the ``freshness'' of the collection significantly and bring in new pages in a more timely manner. We first present results from an experiment conducted on more than half million web pages over 4 months, to estimate how web pages evolve over time. Based on these experimental results, we compare various design choices for an incremental crawler and discuss their trade-offs. We propose an architecture for the incremental crawler, which combines the best design choices.
Item Type: | Techreport (Technical Report) | |
---|---|---|
Uncontrolled Keywords: | web evolution, incremental crawler, web change model | |
Subjects: | Computer Science > Databases and the Web | |
Projects: | Digital Libraries | |
Related URLs: | Project Homepage | http://www-diglib.stanford.edu/diglib/pub/ |
ID Code: | 376 | |
Deposited By: | Import Account | |
Deposited On: | 25 Feb 2000 16:00 | |
Last Modified: | 27 Dec 2008 21:04 |
Download statistics
Repository Staff Only: item control page