Park, Hyunjung and Ikeda, Robert and Widom, Jennifer (2011) RAMP: A System for Capturing and Tracing Provenance in MapReduce Workflows. In: 37th International Conference on Very Large Data Bases (VLDB), Seattle, Washington.
|PDF - Published Version|
RAMP (Reduce And Map Provenance) is an extension to Hadoop that supports provenance capture and tracing for workflows of MapReduce jobs. RAMP uses a wrapper-based approach, requiring little if any user intervention in most cases, while retaining Hadoop’s parallel execution and fault tolerance. We demonstrate RAMP on a real-world MapReduce workflow generated from a Pig script that performs sentiment analysis over Twitter data. We show how RAMP’s automatic provenance capture and tracing capabilities provide a convenient and efficient means of drilling-down and verifying output elements.
|Item Type:||Conference or Workshop Item (Paper)|
|Deposited By:||Hyunjung Park|
|Deposited On:||22 Mar 2011 23:49|
|Last Modified:||17 Jul 2011 11:03|
Repository Staff Only: item control page