Stanford InfoLab Publication Server

The Evolution of the Web and Implications for an Incremental Crawler

Cho, J. and Garcia-Molina, H. (1999) The Evolution of the Web and Implications for an Incremental Crawler. Technical Report. Stanford.

BibTeXDublinCoreEndNoteHTML

[img]
Preview
PDF
278Kb

Abstract

In this paper we study how to build an effective incremental crawler. The crawler selectively and incrementally updates its index and/or local collection of web pages, instead of periodically refreshing the collection in batch mode. The incremental crawler can improve the ``freshness'' of the collection significantly and bring in new pages in a more timely manner. We first present results from an experiment conducted on more than half million web pages over 4 months, to estimate how web pages evolve over time. Based on these experimental results, we compare various design choices for an incremental crawler and discuss their trade-offs. We propose an architecture for the incremental crawler, which combines the best design choices.

Item Type:Techreport (Technical Report)
Uncontrolled Keywords:web evolution, incremental crawler, web change model
Subjects:Computer Science > Databases and the Web
Projects:Digital Libraries
Related URLs:Project Homepagehttp://www-diglib.stanford.edu/diglib/pub/
ID Code:376
Deposited By:Import Account
Deposited On:25 Feb 2000 16:00
Last Modified:27 Dec 2008 21:04

Download statistics

Repository Staff Only: item control page