Stanford InfoLab Publication Server

Lineage Tracing in Data Warehouses

Cui, Yingwei (2001) Lineage Tracing in Data Warehouses. PhD thesis, Stanford University.




Data warehousing systems collect data from multiple distributed data sources and store integrated and summarized information in local databases for efficient data analysis and mining. Sometimes,when analyzing data at a warehouse, it is useful to "drill down" and investigate the source data from which certain warehouse data was derived. For a given warehouse data item, identifying the exact set of source data items that produced the warehouse data item is termed the "data lineage" problem. This thesis presents our research results on tracing data lineage in a warehousing environment: 1. Formal definitions of data lineage for data warehouses defined as relational materialized views over relational sources, and for warehouses defined using graphs of general data transformations. 2. Algorithms for lineage tracing, again considering both relational and transformational warehouses, along with a suite of optimization techniques. 3. Performance evaluations through simulations, and a lineage tracing prototype developed within the WHIPS (WareHousing Information Processing System) project at Stanford. 4. Applying data lineage techniques to obtain improved algorithms for the well-known database view update problem.

Item Type:Thesis (PhD)
Uncontrolled Keywords:data lineage view derivation view update
Subjects:Computer Science > Data Warehousing
Related URLs:Project Homepage
ID Code:522
Deposited By:Import Account
Deposited On:02 Dec 2001 16:00
Last Modified:27 Dec 2008 09:49

Download statistics

Repository Staff Only: item control page