Das Sarma, Anish and Agrawal, Parag and Nabar, Shubha and Widom, Jennifer (2008) Towards Special-Purpose Indexes and Statistics for Uncertain Data. In: MUD.
|PDF - Accepted Version|
The Trio project at Stanford for managing data, uncertainty, and lineage is developed on top of a conventional DBMS. Uncertain data with lineage is encoded in relational tables, and Trio queries are translated to SQL queries on the encoding. Such a layered approach reaps significant benefits in terms of architectural simplicity, and the ability to use an off-the-shelf query processing engine. In this paper, we present special-purpose indexes and statistics that complement the layered approach to further enhance its performance. First, we identify a well-defined structure of Trio queries, relations, and their encoding that can be exploited by the underlying query optimizer to improve the performance using Trio's layered approach. We propose several mechanisms for indexing Trio's uncertain relations and study when these indexes are useful. We then present an interesting order, and an associated operator, which are especially useful to consider when composing query plans. The decision of which query plan to use for a Trio query is dictated by various statistical properties of the input data. We identify the statistical data that can guide the underlying optimizer, and design histograms that enable estimating the statistics accurately.
|Item Type:||Conference or Workshop Item (Paper)|
|Related URLs:||Project Homepage||http://infolab.stanford.edu/|
|Deposited By:||Parag Agrawal|
|Deposited On:||04 Jun 2008 17:00|
|Last Modified:||05 Jan 2009 16:18|
Repository Staff Only: item control page