Cui, Yingwei (2001) Lineage Tracing in Data Warehouses. PhD thesis, Stanford University.
Data warehousing systems collect data from multiple distributed data sources and store integrated and summarized information in local databases for efficient data analysis and mining. Sometimes,when analyzing data at a warehouse, it is useful to "drill down" and investigate the source data from which certain warehouse data was derived. For a given warehouse data item, identifying the exact set of source data items that produced the warehouse data item is termed the "data lineage" problem. This thesis presents our research results on tracing data lineage in a warehousing environment: 1. Formal definitions of data lineage for data warehouses defined as relational materialized views over relational sources, and for warehouses defined using graphs of general data transformations. 2. Algorithms for lineage tracing, again considering both relational and transformational warehouses, along with a suite of optimization techniques. 3. Performance evaluations through simulations, and a lineage tracing prototype developed within the WHIPS (WareHousing Information Processing System) project at Stanford. 4. Applying data lineage techniques to obtain improved algorithms for the well-known database view update problem.
|Item Type:||Thesis (PhD)|
|Uncontrolled Keywords:||data lineage view derivation view update|
|Subjects:||Computer Science > Data Warehousing|
|Related URLs:||Project Homepage||http://infolab.stanford.edu/warehousing/warehouse.html|
|Deposited By:||Import Account|
|Deposited On:||02 Dec 2001 16:00|
|Last Modified:||27 Dec 2008 09:49|
Repository Staff Only: item control page