Raghavan, Sriram and Garcia-Molina, Hector (2003) Complex Queries over Web Repositories. Technical Report. Stanford.
BibTeX | DublinCore | EndNote | HTML |
![]()
| PDF 4Mb |
Abstract
Web repositories, such as the Stanford WebBase repository, manage large heterogeneous collections of Web pages and associated indexes. For effective analysis and mining, these repositories must provide a declarative query interface that supports "complex expressive Web queries". Such queries have two key characteristics: (i) They view a Web repository simultaneously as a collection of text documents, as a navigable directed graph, and as a set of relational tables storing properties of Web pages (length, URL, title, etc.). (ii) The queries employ application-specific ranking and ordering relationships over pages and links to filter out and retrieve only the "best" query results. In this paper, we model a Web repository in terms of "Web relations" and describe an algebra for expressing complex Web queries. Our algebra extends traditional relational operators as well as graph navigation operators to uniformly handle plain, ranked, and ordered Web relations. In addition, we present an overview of the cost-based optimizer and execution engine that we have developed, to efficiently execute Web queries over large repositories.
Item Type: | Techreport (Technical Report) | |
---|---|---|
Uncontrolled Keywords: | Complex queries, Web repositories, Ordered relations, Web graph navigation | |
Subjects: | Computer Science > Databases and the Web | |
Projects: | Digital Libraries | |
Related URLs: | Project Homepage | http://www-diglib.stanford.edu/diglib/pub/ |
ID Code: | 576 | |
Deposited By: | Import Account | |
Deposited On: | 16 Feb 2003 16:00 | |
Last Modified: | 07 Oct 2008 12:16 |
Download statistics
Repository Staff Only: item control page