Stanford InfoLab Publication Server

Boolean Query Mapping Across Heterogeneous Information Sources

Chang, C. and Garcia-Molina, H. and Paepcke, A. (1996) Boolean Query Mapping Across Heterogeneous Information Sources. Technical Report. Stanford InfoLab. (Publication Note: IEEE Transactions on Knowledge and Data Engineering as part of a special section of concise research papers on Digital Libraries (Aug. 1996).)




---Searching over heterogeneous information sources is difcult because of the non-uniform querylanguages. Our approach is to allow a user to compose Boolean queries in one rich front-end language. Foreach user query and target source, we transform the user query into a subsuming query that can be supportedby the source but that may return extra documents. The results are then processed by a lter query to yield thecorrect nal result. In this paper we introduce the architecture and associated algorithms for generating thesupported subsuming queries and lters. We show that generated subsuming queries return a minimal num-ber of documents; we also discuss how minimal cost lters can be obtained. We have implemented prototypeversions of these algorithms and demonstrated them on heterogeneous Boolean systems.Index Terms---Boolean queries, query translation, information retrieval, heterogeneity, digital libraries,query subsumption, ltering.I.INTRODUCTIONEmerging Digital Libraries can provide a wealth of information. However, there are also a wealth of searchengines behind these libraries, each with a different document model and query language. Our goal is to provide afront-end to a collection of Digital Libraries that hides, as much as possible, this heterogeneity. As a rst step, in thispaper we focus on translating Boolean queries [18][6], from a generalized form, into queries that only use the func-tionality and syntax provided by a particular target search engine. We initially look at Boolean queries because theyare used by most current commercial systems; eventually we will incorporate other types of queries such as vectorspace and probabilistic-model ones [18][6]. The following example illustrates our approach.Example 1.1Suppose that a user is interested in documents discussing multiprocessors and distributed systems. Saythe users query is originally formulated as follows:User Query: Title Contains multiprocessor AND distributed W systemThis query selects documents wi

Item Type:Techreport (Technical Report)
Uncontrolled Keywords:Boolean queries, query translation, information retrieval, heterogeneity, digital libraries, query subsumption, filtering.
Subjects:Computer Science > Digital Libraries
Computer Science > Query Processing
Projects:Digital Libraries
Related URLs:Project Homepage
ID Code:193
Deposited By:Import Account
Deposited On:25 Feb 2000 16:00
Last Modified:08 Dec 2008 15:01

Download statistics

Repository Staff Only: item control page