Labio, W. and Wiener, J. and Garcia-Molina, H. and Gorelik, V. (1999) Efficient Resumption of Interrupted Warehouse Loads. Technical Report. Stanford.
BibTeX | DublinCore | EndNote | HTML |
| PDF 406Kb |
Abstract
Data warehouses collect large quantities of data from distributed sources into a single repository. A typical load to create or maintain a warehouse processes GBs of data, takes hours or even days to execute, and involves many complex and user-defined transformations of the data (e.g., find duplicates, resolve data inconsistencies, and add unique keys). If the load fails, a possible approach is to ``redo'' the entire load. A better approach is to resume the incomplete load from where it was interrupted. Unfortunately, traditional algorithms for resuming the load either impose unacceptable overhead during normal operation, or rely on the specifics of simple transformations. We develop a resumption algorithm called DR that imposes no overhead and relies only on the basic properties of the transformations. We show experimentally that DR can lead to almost a ten-fold reduction in resumption time
Item Type: | Techreport (Technical Report) | |
---|---|---|
Uncontrolled Keywords: | Recovery, Data Warehouse, Transformation | |
Subjects: | Computer Science > Data Warehousing | |
Projects: | WHIPS | |
Related URLs: | Project Homepage | http://infolab.stanford.edu/warehousing/warehouse.html |
ID Code: | 407 | |
Deposited By: | Import Account | |
Deposited On: | 25 Feb 2000 16:00 | |
Last Modified: | 28 Dec 2008 09:21 |
Download statistics
Repository Staff Only: item control page