Stanford InfoLab Publication Server

The Pipelined Set Cover Problem

Munagala, Kamesh and Babu, Shivnath and Motwani, Rajeev and Widom, Jennifer (2003) The Pipelined Set Cover Problem. Technical Report. Stanford.

BibTeXDublinCoreEndNoteHTML

[img]
Preview
PDF
160Kb

Abstract

The <i>pipelined filters</i> problem in databases is the problem of finding the optimal ordering of a set of dependent selection or join operators in a query processor. We provide an abstraction of this problem as a generalization of set cover called <i>pipelined set cover</i>, where the sets are applied sequentially to the universe and the covered elements are discarded. The cost of including a set in pipelined set cover is directly proportional to the number of uncovered elements at the point of applying the set. The cost therefore depends on the ordering of the sets, in addition to the sets themselves. We show that several natural heuristics for this NP-hard problem (such as greedy set cover and local search) can be analyzed using a common linear-programming framework, which bounds not only the approximation ratio, but also the running time of the corresponding algorithms. In particular, we show that the greedy and local search algorithms are 4-approximations for both uniform and nonuniform processing costs, which is in contrast to the logarithmic performance guarantees achievable for classical set cover. We extend our analysis to minimize the Lp-norm of the costs paid by the sets, where p >= 2 is an integer, to examine the improvement in performance when the total cost has increasing contribution from the initial sets in the pipeline. Using a novel Lagrangian-relaxation analysis, we show that the greedy algorithm is a 9^(1/p)-approximation for the uniform processing cost model, while the local search algorithm is a 4^(1/p)-approximation when the costs are nonuniform. Our analysis framework may be applicable to other problems. The algorithms we consider have effective and efficient implementations in a data-stream processing system.

Item Type:Techreport (Technical Report)
Uncontrolled Keywords:data streams, join, set cover
Subjects:Computer Science > Data Streams
Projects:STREAM
Related URLs:Project Homepagehttp://infolab.stanford.edu/stream/
ID Code:620
Deposited By:Import Account
Deposited On:15 Oct 2003 17:00
Last Modified:24 Dec 2008 10:36

Download statistics

Repository Staff Only: item control page