Stanford InfoLab Publication Server

Lineage Tracing in a Data Warehousing System

Cui, Y. and Widom, J. (1999) Lineage Tracing in a Data Warehousing System. Technical Report. Stanford InfoLab. (Publication Note: 16th International Conference on Data Engineering (ICDE 2000), San Diego, California, February 28 - March 3, 2000)




Lineage Tracing in a Data Warehousing System (Demonstration Proposal) Yingwei Cui and Jennifer Widom Stanford University fcyw, A data warehousing system collects data from multiple distributed sources and stores the integrated information as materialized views in a local data warehouse. Users then perform data analysis and mining on the warehouse views. Figure 1 shows the basic architecture of a data warehousing system. In many cases, the warehouse view contents alone are not suffcient for in-depth analysis. It is often useful to be able to "drill through" from interesting (or potentially erroneous) view data to the original source data that derived the view data. For a given view data item, identifying the exact set of base data items that produced the view data item is termed the view data lineage problem. Motivation for and applications of lineage tracing in a warehousing environment are provided in [2]. In the context of the WHIPS data warehousing project at Stanford [3], we have developed a complete prototype that performs effcient and consistent lineage tracing. Some commercial data warehousing systems support schema-level lineage tracing, or provide specialized drill-down and/or drill-through facilities for multi-dimensional warehouse views. Our lineage tracing prototype supports more fine-grained instance-level lineage tracing for arbitrarily complex relational views, including aggregation. Our prototype automatically generates lineage tracing procedures and supporting auxiliary views at view definition time. At lineage tracing time, the system applies the tracing procedures to the source tables and/or auxiliary views to obtain the lineage results and show the specific view data derivation process. 1 Lineage T racing System 1.1 Lineage Example Given a view data item I, the exact set of source data that produced I is called I's lineage. We use an example to illustrate the concepts; a full formalization of the problem along with solutions.

Item Type:Techreport (Technical Report)
Uncontrolled Keywords:view data lineage, derivation tree
Subjects:Computer Science > Data Warehousing
Related URLs:Project Homepage
ID Code:403
Deposited By:Import Account
Deposited On:25 Feb 2000 16:00
Last Modified:27 Dec 2008 21:21

Download statistics

Repository Staff Only: item control page