Stanford InfoLab Publication Server

Attribute-based Crowd Entity Resolution

Khan, Asif R. and Garcia-Molina, Hector (2016) Attribute-based Crowd Entity Resolution. In: CIKM '16 Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, October 24 - 28, 2016 , Indianapolis, Indiana.

BibTeXDublinCoreEndNoteHTML

[img]PDF - Published Version
1784Kb

Abstract

We study the problem of using the crowd to perform entity resolution (ER) on a set of records. For many types of records, especially those involving images, such a task can be difficult for machines, but relatively easy for humans. Typical crowd-based ER approaches ask workers for pairwise judgments between records, which quickly becomes prohibitively expensive even for moderate numbers of records. In this paper, we reduce the cost of pairwise crowd ER approaches by soliciting the crowd for attribute labels on records, and then asking for pairwise judgments only between records with similar sets of attribute labels. However, due to errors induced by crowd-based attribute labeling, a naive attribute-based approach becomes extremely inaccurate even with few attributes. To combat these errors, we use error mitigation strategies which allow us to control the accuracy of our results while maintaining significant cost reductions. We develop a probabilistic model which allows us to determine the optimal, lowest-cost combination of error mitigation strategies needed to achieve a minimum desired accuracy. We test our approach with actual crowdworkers on a dataset of celebrity images, and find that our results yield crowd ER strategies which achieve high accuracy yet are significantly lower cost than pairwise-only approaches.

Item Type:Conference or Workshop Item (Paper)
ID Code:1147
Deposited By:Asif Khan
Deposited On:11 Nov 2016 16:34
Last Modified:11 Nov 2016 16:34

Download statistics

Repository Staff Only: item control page