Stanford InfoLab Publication Server

Provenance-Based Debugging and Drill-Down in Data-Oriented Workflows

Ikeda, Robert and Cho, Junsang and Fang, Charlie and Salihoglu, Semih and Torikai, Satoshi and Widom, Jennifer Provenance-Based Debugging and Drill-Down in Data-Oriented Workflows. In: ICDE 2012.




Panda (for Provenance and Data) is a system that supports the creation and execution of data-oriented workflows, with automatic provenance generation and built-in provenance tracing operations. Workflows in Panda are arbitrary acyclic graphs containing both relational (SQL) processing nodes and opaque processing nodes programmed in Python. For both types of nodes, Panda generates logical provenance---provenance information stored at the processing-node level---and uses the generated provenance to support record-level backward tracing and forward tracing operations. In our demonstration we use Panda to integrate, process, and analyze actual education data from multiple sources. We specifically demonstrate how Panda's provenance generation and tracing capabilities can be very useful for workflow debugging, and for drilling down on specific results of interest.

Item Type:Conference or Workshop Item (Paper)
ID Code:1008
Deposited By:Robert Ikeda
Deposited On:26 Jul 2011 16:38
Last Modified:21 Oct 2011 16:58

Download statistics

Repository Staff Only: item control page