Whang, Steven and Brower, Chad and Shanmugasundaram, Jayavel and Vassilvitskii, Sergei and Vee, Erik and Yerneni, Ramana and Garcia-Molina, Hector (2009) Indexing Boolean Expressions. In: VLDB 2009, Aug. 24-28, 2009, Lyon, France.
BibTeX | DublinCore | EndNote | HTML |
| PDF - Published Version 224Kb |
Abstract
We consider the problem of efficiently indexing Disjunctive Normal Form (DNF) and Conjunctive Normal Form (CNF) Boolean expressions over a high-dimensional multi-valued attribute space. The goal is to rapidly find the set of Boolean expressions that evaluate to true for a given assignment of values to attributes. A solution to this problem has applications in online advertising (where a Boolean expression represents an advertiser's user targeting requirements, and an assignment of values to attributes represents the characteristics of a user visiting an online page) and in general any publish/subscribe system (where a Boolean expression represents a subscription, and an assignment of values to attributes represents an event). All existing solutions that we are aware of can only index a specialized sub-set of conjunctive and/or disjunctive expressions, and cannot efficiently handle general DNF and CNF expressions (including NOTs) over multi-valued attributes. In this paper, we present a novel solution based on the inverted list data structure that enables us to index arbitrarily complex DNF and CNF Boolean expressions over multi-valued attributes. An interesting aspect of our solution is that, by virtue of leveraging inverted lists traditionally used for ranked information retrieval, we can efficiently return the top-N matching Boolean expressions. This capability enables emerging applications such as {\em ranked} publish/subscribe systems~\cite{Mach08}, where only the top subscriptions that match an event are desired. For example, in online advertising there is a limit on the number of advertisements that can be shown on a given page and only the ``best'' advertisements can be displayed. We have evaluated our proposed technique based on data from an online advertising application, and the results show a dramatic performance improvement over prior techniques.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Projects: | Miscellaneous |
ID Code: | 927 |
Deposited By: | Steven Whang |
Deposited On: | 01 Jun 2009 21:52 |
Last Modified: | 04 Aug 2009 17:22 |
Download statistics
Repository Staff Only: item control page