Brin, Sergey (1999) Extracting Patterns and Relations from the World Wide Web. Technical Report. Stanford InfoLab. (Publication Note: WebDB Workshop at EDBT'98)
BibTeX | DublinCore | EndNote | HTML |
| PDF 221Kb |
Abstract
The World Wide Web is a vast resource for information. At the same time it is extremely distributed. A particular type of data such as restaurant lists may be scattered across thousands of independent information sources in many different formats. In this paper, we consider the problem of extracting a relation for such a data type from all of these sources automatically. We present a technique which exploits the duality between sets of patterns and relations to grow the target relation starting from a small sample. To test our technique we use it to extract a relation of (author, title) pairs from the World Wide Web.
Item Type: | Techreport (Technical Report) | |
---|---|---|
Additional Information: | Previous number = SIDL-WP-1999-0119 | |
Subjects: | Computer Science > Digital Libraries | |
Projects: | Digital Libraries | |
Related URLs: | Project Homepage | http://www-diglib.stanford.edu/diglib/pub/ |
ID Code: | 421 | |
Deposited By: | Import Account | |
Deposited On: | 30 Oct 2001 16:00 | |
Last Modified: | 27 Dec 2008 16:15 |
Download statistics
Repository Staff Only: item control page