Fang, M. and Shivakumar, N. and Garcia-Molina, H. and Motwani, R. and Ullman, J. (1998) Computing iceberg queries efficiently. In: 24rd International Conference on Very Large Data Bases (VLDB 1998), August 24-27, 1998, New York, NY.
BibTeX | DublinCore | EndNote | HTML |
![]()
| PDF 325Kb |
Abstract
Many applications compute aggregate functions over an attribute (or set attributes) to find aggregate values above some specified threshold. We call such queries iceberg queries, because the number of abovethreshold results is often very small (the tip of an icebrelative to the large amount of input data (the icebSuch iceberg queries are common in many applications, including data warehousing, information-retrieval, market basket analysis in data mining, clustering and copy detection. We propose effcient algorithms to evaluate iceberg queries using very little memory and significantly fewer passes over data, when compared to current techniques that use sorting or hashing. We present an experimental case study using over three gigabytes of Web data to illustrate the savings obtained by our algorithms.
Item Type: | Conference or Workshop Item (Paper) | |
---|---|---|
Uncontrolled Keywords: | Aggregate queries, SCAM, data-mining | |
Subjects: | Computer Science > Data Mining Computer Science > Data Integration and Mediation | |
Projects: | Information Integration | |
Related URLs: | Project Homepage | http://infolab.stanford.edu/serf/ |
ID Code: | 326 | |
Deposited By: | Import Account | |
Deposited On: | 25 Feb 2000 16:00 | |
Last Modified: | 29 Dec 2008 09:34 |
Download statistics
Repository Staff Only: item control page