Stanford InfoLab Publication Server

Distributed and Parallel Computing Issues in Data Warehousing (Invited Talk)

Garcia-Molina, H. and Labio, W. and Wiener, J. and Zhuge, Y. (1998) Distributed and Parallel Computing Issues in Data Warehousing (Invited Talk). Technical Report. Stanford InfoLab. (Publication Note: ACM Principles of Distributed Computing Conference, Invited paper 1998 (Will be published in 1999 proceedings))




In data integration systems, queries posed to a mediator need to be translated into a sequence of queries to the underlying data sources. In a heterogeneous environment, with sources of diverse and limited query capabilities, not all the translations are feasible. In this paper, we study the problem of finding feasible and effcient query plans for mediator systems. We consider conjunctive queries on mediators and model the source capabilities through attribute-binding adornments. We use a simple cost model that focuses on the major costs in mediation systems, those involved with sending queries to sources and getting answers back. Under this metric, we develop two algorithms for source query sequencing { one based on a simple greedy strategy and another based on a partitioning scheme. The first algorithm produces optimal plans in some scenarios, and we show a linear bound on its worst case performance when it misses optimal plans. The second algorithm generates optimal plans in more scenarios, while having no bound on the margin by which it misses the optimal plans. We also report on the results of the experiments that study the performance of the two algorithms

Item Type:Techreport (Technical Report)
Subjects:Computer Science > Data Warehousing
Related URLs:Project Homepage
ID Code:327
Deposited By:Import Account
Deposited On:22 Sep 2002 17:00
Last Modified:29 Dec 2008 10:40

Download statistics

Repository Staff Only: item control page