Stanford InfoLab Publication Server

DataSift: An Expressive and Accurate Crowd-Powered Search Toolkit

Parameswaran, Aditya and Teh, Ming Han and Garcia-Molina, Hector and Widom, Jennifer DataSift: An Expressive and Accurate Crowd-Powered Search Toolkit. Technical Report. Stanford InfoLab.




Traditional information retrieval systems have limited functionality. For instance, they are not able to adequately support queries containing non-textual fragments such as images or videos, queries that are very long or ambiguous, or semantically-rich queries over non-textual corpora. In this paper, we present DataSift, an expressive and accurate crowd-powered search toolkit that can connect to any corpus. We provide a number of alternative configurations for DataSift using crowdsourced and automated components, and demonstrate gains of 2–3x on precision over traditional retrieval schemes using experiments on real corpora. We also present our results on determining suitable values for parameters in those configurations, along with a number of interesting insights learned along the way.

Item Type:Techreport (Technical Report)
Uncontrolled Keywords:crowd-powered search, information retrieval, human computation, crowd algorithms
ID Code:1068
Deposited By:Aditya Parameswaran
Deposited On:20 May 2013 14:16
Last Modified:20 May 2013 14:16

Download statistics

Repository Staff Only: item control page