Stanford InfoLab Publication Server

Querying Documents in Object Databases

Abiteboul, S. and Cluet, S. and Christophides, V. and Milo, T. and Moerkotte, G. and Siméon, J. (1997) Querying Documents in Object Databases. Technical Report. Stanford InfoLab. (Publication Note: International Journal on Digital Libraries Volume 1, Number 1, April, 1997 )

BibTeXDublinCoreEndNoteHTML

[img]
Preview
PDF
313Kb

Abstract

We consider the problem of storing and accessing documents (SGML and HTML, in particular) using database technology . T o specify the database image of documents, we use structuring schemas that consist in grammars annotated with database programs. T o query documents, we introduce an extension of OQL, the ODMG standard query language for ob ject databases. Our extension (named OQL-doc) allows to query documents without a precise knowledge of their structure using in particular generalized path expressions and pattern matching. This allows us to introduce in a declarative language (in the style of SQL or navigational and information retriev al styles of accessing data. Query processing in the context of documents and path expressions leads to challenging implementation issues. W e extend an ob ject algebra with new operators to deal with generalized path expressions. W e then consider two essential complementary optimization techniques: 1. we show that almost standard database optimization techniques can be used to answer queries without having to load the entire document into the database. 2. we also consider the interaction of full-text indexes (e.g., inverted files) with standard database collection indexes (e.g., B-trees) that provide important speedup. The paper is an overview of a research pro ject at INRIA-Rocquencourt. Some particular aspects are detailed in [ACM93, CACS94, ACM95, CCM96].

Item Type:Techreport (Technical Report)
Uncontrolled Keywords:SGML, object database, semistructured, document
Subjects:Computer Science > Query Processing
Projects:Lore
Related URLs:Project Homepagehttp://infolab.stanford.edu/lore/
ID Code:244
Deposited By:Import Account
Deposited On:25 Feb 2000 16:00
Last Modified:30 Dec 2008 09:27

Download statistics

Repository Staff Only: item control page