Extracting Patterns and Relations from the World Wide Web.

Brin, Sergey (1999) Extracting Patterns and Relations from the World Wide Web. Technical Report. Stanford InfoLab. (Publication Note: WebDB Workshop at EDBT'98)




The World Wide Web is a vast resource for information. At the same time it is extremely distributed. A particular type of data such as restaurant lists may be scattered across thousands of independent information sources in many different formats. In this paper, we consider the problem of extracting a relation for such a data type from all of these sources automatically. We present a technique which exploits the duality between sets of patterns and relations to grow the target relation starting from a small sample. To test our technique we use it to extract a relation of (author, title) pairs from the World Wide Web.

