Stanford InfoLab Publication Server

Extracting Patterns and Relations from the World Wide Web.

Brin, Sergey (1999) Extracting Patterns and Relations from the World Wide Web. Technical Report. Stanford InfoLab. (Publication Note: WebDB Workshop at EDBT'98)




The World Wide Web is a vast resource for information. At the same time it is extremely distributed. A particular type of data such as restaurant lists may be scattered across thousands of independent information sources in many different formats. In this paper, we consider the problem of extracting a relation for such a data type from all of these sources automatically. We present a technique which exploits the duality between sets of patterns and relations to grow the target relation starting from a small sample. To test our technique we use it to extract a relation of (author, title) pairs from the World Wide Web.

Item Type:Techreport (Technical Report)
Additional Information:Previous number = SIDL-WP-1999-0119
Subjects:Computer Science > Digital Libraries
Projects:Digital Libraries
Related URLs:Project Homepage
ID Code:421
Deposited By:Import Account
Deposited On:30 Oct 2001 16:00
Last Modified:27 Dec 2008 16:15

Download statistics

Repository Staff Only: item control page