Dalvi, Nilesh and Parameswaran, Aditya and Rastogi, Vibhor Minimizing Uncertainty in Pipelines. Technical Report. Stanford InfoLab. (Publication Note: Extended Version of Paper Published at NIPS 2012)
BibTeX | DublinCore | EndNote | HTML |
![]()
| PDF 339Kb |
Abstract
In this paper, we consider the problem of debugging large pipelines by human labeling. We represent the execution of a pipeline using a directed acyclic graph of AND and OR nodes, where each node represents a data item produced by some operator in the pipeline. We assume that each operator assigns a confidence to each of its output data. We want to reduce the uncertainty in the output by issuing queries to a human, where a query consists of checking if a given data item is correct. In this paper, we consider the problem of asking the optimal set of queries to minimize the resulting output uncertainty. We perform a detailed evaluation of the complexity of the problem for various classes of graphs. We give efficient algorithms for the problem for trees, and show that, for a general dag, the problem is intractable.
Item Type: | Techreport (Technical Report) |
---|---|
Uncontrolled Keywords: | human labeling, crowdsourcing, probabilistic databases, lineage, workflows, debugging |
ID Code: | 1019 |
Deposited By: | Aditya Parameswaran |
Deposited On: | 29 Nov 2011 12:12 |
Last Modified: | 08 Nov 2012 16:06 |
Download statistics
Repository Staff Only: item control page