Stanford InfoLab Publication Server

RAMP: A System for Capturing and Tracing Provenance in MapReduce Workflows

Park, Hyunjung and Ikeda, Robert and Widom, Jennifer (2011) RAMP: A System for Capturing and Tracing Provenance in MapReduce Workflows. In: 37th International Conference on Very Large Data Bases (VLDB), Seattle, Washington.

BibTeXDublinCoreEndNoteHTML

[img]PDF - Published Version
1108Kb

Abstract

RAMP (Reduce And Map Provenance) is an extension to Hadoop that supports provenance capture and tracing for workflows of MapReduce jobs. RAMP uses a wrapper-based approach, requiring little if any user intervention in most cases, while retaining Hadoop’s parallel execution and fault tolerance. We demonstrate RAMP on a real-world MapReduce workflow generated from a Pig script that performs sentiment analysis over Twitter data. We show how RAMP’s automatic provenance capture and tracing capabilities provide a convenient and efficient means of drilling-down and verifying output elements.

Item Type:Conference or Workshop Item (Paper)
ID Code:995
Deposited By:Hyunjung Park
Deposited On:22 Mar 2011 23:49
Last Modified:17 Jul 2011 11:03

Download statistics

Repository Staff Only: item control page