Stanford InfoLab Publication Server

Query Optimization for Semistructured Data

McHugh, J. and Widom, J. (1997) Query Optimization for Semistructured Data. Technical Report. Stanford InfoLab.




XML is an emerging standard for data representation and exchange on the World-Wide Web. Due to the nature of information on the Web and the inherent flexibility of XML, we expect that much of the data encoded in XML will be semistructured: the data may be irregular or incomplete, and its structure may change rapidly or unpredictably. This paper describes the query processor of Lore, a DBMS for XML-based data supporting an expressive query language. We focus primarily on Lore's cost-based query optimizer. While all of the usual problems associated with cost-based query optimization apply to XML-based query languages, a number of additional problems arise, such as new kinds of indexing, more complicated notions of database statistics, and vastly different query execution strategies for different databases. We define appropriate logical and physical query plans, database statistics, and a cost model, and we describe plan enumeration including heuristics for reducing the large search space. Our optimizer is fully implemented in Lore and preliminary performance results are reported

Item Type:Techreport (Technical Report)
Uncontrolled Keywords:semistructured databases, cost based query optimization
Subjects:Computer Science > Semistructured Data
Related URLs:Project Homepage
ID Code:229
Deposited By:Import Account
Deposited On:25 Feb 2000 16:00
Last Modified:04 Jan 2009 11:59

Download statistics

Repository Staff Only: item control page