Shivnath, Babu and Garofalakis, Minos and Rastogi, Rajeev and Silberschatz, Avi (2001) Model-Based Semantic Compression for Network-Data Tables. In: Workshop on Network-Related Data Management (NRDM 2001), May 25, 2001, Santa Barbara, California.
BibTeX | DublinCore | EndNote | HTML |
![]()
| PDF 116Kb |
Abstract
While a variety of lossy compression schemes have been developed for certain forms of digital data (e.g., images, audio, video), the area of lossy compression techniques for arbitrary data tables has been left relatively unexplored. Nevertheless, such techniques are clearly motivated by the ever-increasing data collection rates of modern enterprises and the need for effective, guaranteed-quality approximate answers to queries over massive relational data sets. In this paper, we propose Model-Based Semantic Compression (MBSC), a novel data compression framework that takes advantage of attribute semantics and data-mining models to perform lossy compression of massive data tables. We describe the architecture and algorithms underlying SPARTAN, a model-based semantic compression system that exploits predictive data correlations and prescribed error tolerances for individual attributes to construct concise and accurate Classification and Regression Tree (CaRT) models for entire columns of a table. Our experimentation with several real-life data sets has offered convincing evidence of the effectiveness of SPARTAN's model-based approach -- SPARTAN is able to consistently yield substantially better compression ratios than existing semantic or syntactic compression tools (e.g., gzip) while utilizing only small data samples for model inference. Several promising directions for future research and possible applications of MBSC in the context of network management are identified and discussed.
Item Type: | Conference or Workshop Item (Paper) | |
---|---|---|
Additional Information: | An extended discussion of the technical results in this abstract appears in the Proceedings of ACM SIGMOD 2001 International Conference on Management of Data | |
Subjects: | Computer Science > Data Mining Miscellaneous | |
Projects: | Miscellaneous | |
Related URLs: | Project Homepage | http://infolab.stanford.edu/ |
ID Code: | 495 | |
Deposited By: | Import Account | |
Deposited On: | 30 May 2001 17:00 | |
Last Modified: | 27 Dec 2008 10:56 |
Download statistics
Repository Staff Only: item control page