Ikeda, Robert and Das Sarma, Akash and Widom, Jennifer Logical Provenance in Data-Oriented Workflows (Long Version). Technical Report. Stanford InfoLab.
We consider the problem of defining, generating, and tracing provenance in data-oriented workflows, in which input data sets are processed by a graph of transformations to produce output results. We first give a new general definition of provenance for general transformations, introducing the notions of correctness, precision, and minimality. We then determine when properties such as correctness and minimality carry over from the individual transformations' provenance to the workflow provenance. We describe a simple logical-provenance specification language consisting of attribute mappings and filters. We provide algorithms for provenance tracing in workflows where logical provenance for each transformation is specified using our language. We consider logical provenance in the relational setting, showing that for a class of Select-Project-Join (SPJ) transformations, logical provenance specifications encode minimal provenance. We have built a prototype system supporting the features and algorithms presented in the paper, and we report a few preliminary experimental results.
|Item Type:||Techreport (Technical Report)|
|Deposited By:||Robert Ikeda|
|Deposited On:||06 Jun 2012 12:22|
|Last Modified:||06 Jun 2012 12:22|
Repository Staff Only: item control page