Stanford InfoLab Publication Server

Online Distributed Predicate Evaluation

Feder, Tomas and Motwani, Rajeev and O'Callaghan, Liadan and Panigrahy, Rina and Thomas, Dilys (2003) Online Distributed Predicate Evaluation. Technical Report. Stanford.




An increasing number of databases store not only alphanumeric data, but also images, histograms, and other, more complicated data objects. Querying such databases can involve the evaluation of expensive predicates. Furthermore, since the data from a single database may be stored on a distributed system, certain predicate evaluations may involve remote procedure calls or transfers over a network. Various algorithms exist for efficiently answering queries under such conditions~\cite{chau,chi,hel,mayr}. Here we consider the following formalization: given relations $A$ and $B$, and predicates $P$ and $Q$, find all $(a, b) \in A \times B$ with $P(a)$ and $ Q(b)$. In the ``max'' version of the problem, the goal is to minimize the number of times either $P$ or $Q$ is evaluated, where both may be evaluated simultaneously; in the ``sum'' version it is to minimize the sum of these quantities for $P$ and $Q$. We provide a linear-time 2-approximation for the ``sum'' case and show that no deterministic algorithm can do better. We show a lower bound of 1.5 for the approximation ratio achievable by randomized algorithms and give algorithms achieving this bound for many classes of graphs. For the ``max'' problem, the best known upper bounds for polynomial-time algorithms is 3 for deterministic and 2.67 for randomized algorithms, while 2 is the lower bound for deterministic algorithms. We consider a ``partial max'' problem, where the evaluation of a predicate may be suspended and continued later, and provide a linear-time, deterministic 2-approximation algorithm. For a generalization to $k$-partite graphs we obtain for both ``partial max'' and ``sum'' problems a tight $k$-approximation deterministic algorithm, and give better approximations in many special cases using randomization. We also generalize the deterministic 3 upper bound for the ``max'' problem to a bound of $O(k\log^2 k)$ on $k$-partite graphs.

Item Type:Techreport (Technical Report)
Uncontrolled Keywords:Online Algorithms, Distributed Databases.
Subjects:Computer Science > Query Processing
Related URLs:Project Homepage
ID Code:633
Deposited By:Import Account
Deposited On:06 Jun 2004 17:00
Last Modified:24 Dec 2008 09:46

Download statistics

Repository Staff Only: item control page