Stanford InfoLab Publication Server

Crowdsourcing Structured Data

Park, Hyunjung (2014) Crowdsourcing Structured Data. PhD thesis, Stanford University.


PDF - Published Version


Crowdsourcing can be used to incorporate human computation into a variety of data-intensive tasks that are difficult for computers alone to solve well. Crowd-powered algorithms treat humans as processing units, while crowdsourced data uses humans as a data source. We present two different approaches to the second problem: collecting data from the crowd. We first present Deco, a system for "declarative crowdsourcing." Given a declarative query posed over a relational database, Deco uses the microtask approach to ask specific questions to the crowd, augmenting existing data to produce the query result. After briefly describing Deco's data model and query language, we focus on how Deco's query execution engine and query optimizer work together to produce high-quality query results while minimizing monetary cost and reducing latency. Second, we present CrowdFill, an alternative approach for collecting structured data from the crowd. Instead of posing specific questions as microtasks, CrowdFill shows a partially-filled table to all participating workers; these workers contribute by filling in empty cells, as well as upvoting and downvoting data entered by other workers. We describe how the system uses our primitive operations to guide data collection towards prespecified constraints while providing an intuitive data entry interface. We also describe CrowdFill's compensation scheme that encourages useful work while adhering to a monetary budget.

Item Type:Thesis (PhD)
ID Code:1095
Deposited By:Hyunjung Park
Deposited On:03 Jun 2014 13:47
Last Modified:03 Jun 2014 13:47

Download statistics

Repository Staff Only: item control page