Entity Resolution with Crowd Errors

Verroios, Vasilis and Garcia-Molina, Hector Entity Resolution with Crowd Errors. Technical Report. Stanford InfoLab.


Given a set of records, an ER algorithm finds records that refer to the same real-world entity. Humans can often determine if two records refer to the same entity, and hence we study the problem of selecting questions to ask error-prone humans. We give a Maximum Likelihood formulation for the problem of finding the "most beneficial" questions to ask next. Our theoretical results lead to a lightweight and practical algorithm, bDENSE, for selecting questions to ask humans. Our experimental results show that bDENSE can more quickly reach an accurate outcome, compared to two approaches proposed recently. Moreover, through our experimental evaluation, we identify the strengths and weaknesses of all three approaches.

Item Type:Techreport (Technical Report)
ID Code:1097
Deposited By:vasilis verroios
Deposited On:03 Aug 2014 14:42
Last Modified:03 Aug 2014 14:42

