Stanford InfoLab Publication Server

Entity Resolution with Crowd Errors

Verroios, Vasilis and Garcia-Molina, Hector Entity Resolution with Crowd Errors. Technical Report. Stanford InfoLab.

BibTeXDublinCoreEndNoteHTML

This is the latest version of this item.

[img]PDF
1584Kb

Abstract

Given a set of records, an ER algorithm finds records that refer to the same real-world entity. Humans can often determine if two records refer to the same entity, and hence we study the problem of selecting questions to ask error-prone humans. We give a Maximum Likelihood formulation for the problem of finding the "most beneficial" questions to ask next. Our theoretical results lead to a lightweight and practical algorithm, bDENSE, for selecting questions to ask humans. Our experimental results show that bDENSE can more quickly reach an accurate outcome, compared to two approaches proposed recently. Moreover, through our experimental evaluation, we identify the strengths and weaknesses of all three approaches.

Item Type:Techreport (Technical Report)
ID Code:1097
Deposited By:vasilis verroios
Deposited On:03 Aug 2014 14:42
Last Modified:03 Aug 2014 14:42

Available Versions of this Item

Download statistics

Repository Staff Only: item control page