Stanford InfoLab Publication Server

Computing Iceberg Queries Efficiently.

Fang, Min and Shivakumar, Narayanan and Garcia-Molina, Hector and Motwani, Rajeev and Ullman, Jeffrey D. (1999) Computing Iceberg Queries Efficiently. Technical Report. Stanford InfoLab. (Publication Note: International Conference on Very Large Databases (VLDB'98), New York, August 1998)




Many applications compute aggregate functions (such as COUNT, SUM) over an attribute (or set of attributes) to find aggregate values above some specified threshold. We call such queries iceberg queries because the number of above-threshold results is often very small (the tip of an iceberg), relative to the large amount of input data (the iceberg). Such iceberg queries are common in many applications, including data warehousing, information-retrieval, market basket analysis in data mining, clustering and copy detection. We propose efficient algorithms to evaluate iceberg queries using very little memory and significantly fewer passes over data, as compared to current techniques that use sorting or hashing. We present an experimental case study using over three gigabytes of Web data to illustrate the savings obtained by our algorithms.

Item Type:Techreport (Technical Report)
Additional Information:Previous number = SIDL-WP-1999-0121
Subjects:Computer Science > Digital Libraries
Projects:Digital Libraries
Related URLs:Project Homepage
ID Code:423
Deposited By:Import Account
Deposited On:30 Oct 2001 16:00
Last Modified:28 Dec 2008 08:56

Download statistics

Repository Staff Only: item control page