Stanford InfoLab Publication Server

Complex Queries over Web Repositories

Raghavan, Sriram and Garcia-Molina, Hector (2003) Complex Queries over Web Repositories. Technical Report. Stanford.




Web repositories, such as the Stanford WebBase repository, manage large heterogeneous collections of Web pages and associated indexes. For effective analysis and mining, these repositories must provide a declarative query interface that supports "complex expressive Web queries". Such queries have two key characteristics: (i) They view a Web repository simultaneously as a collection of text documents, as a navigable directed graph, and as a set of relational tables storing properties of Web pages (length, URL, title, etc.). (ii) The queries employ application-specific ranking and ordering relationships over pages and links to filter out and retrieve only the "best" query results. In this paper, we model a Web repository in terms of "Web relations" and describe an algebra for expressing complex Web queries. Our algebra extends traditional relational operators as well as graph navigation operators to uniformly handle plain, ranked, and ordered Web relations. In addition, we present an overview of the cost-based optimizer and execution engine that we have developed, to efficiently execute Web queries over large repositories.

Item Type:Techreport (Technical Report)
Uncontrolled Keywords:Complex queries, Web repositories, Ordered relations, Web graph navigation
Subjects:Computer Science > Databases and the Web
Projects:Digital Libraries
Related URLs:Project Homepage
ID Code:576
Deposited By:Import Account
Deposited On:16 Feb 2003 16:00
Last Modified:07 Oct 2008 12:16

Download statistics

Repository Staff Only: item control page