Park, Hyunjung and Parameswaran, Aditya and Widom, Jennifer (2012) Query Processing over Crowdsourced Data. Technical Report. Stanford InfoLab.
We are building Deco, a comprehensive system for answering declarative queries posed over stored relational data together with data gathered from the crowd. In this paper we present Deco's query processor, building on Deco's data model and query language presented earlier. In general, it has been observed that query processing over crowdsourced data must contend with issues and tradeoffs involving cost, latency, and uncertainty that don't arise in traditional query processing. Deco's overall objective in query execution is to maximize parallelism while fetching data from the crowd (to keep latency low), but only when the parallelism will not issue too many tasks (which would increase cost). Meeting this objective requires a number of changes from traditional query execution. First, Deco's query processor uses a hybrid execution model, which respects Deco semantics while enabling our objective. Our objective also requires prioritizing accesses to crowdsourced data, which turns out to be an interesting NP-hard problem. Finally, because Deco incorporates resolution functions to handle the uncertainty in crowdsourced data, query execution bears as much similarity to incremental view maintenance as to a traditional iterator model. The paper includes initial experimental results, focusing primarily on how our query execution model and access prioritization scheme maximize parallelism without increasing cost.
|Item Type:||Techreport (Technical Report)|
|Deposited By:||Hyunjung Park|
|Deposited On:||22 Aug 2012 13:34|
|Last Modified:||17 Aug 2013 00:14|
Repository Staff Only: item control page