Verroios, Vasilis and Garcia-Molina, Hector Entity Resolution with Crowd Errors. Technical Report. Stanford InfoLab.
Given a set of records, an ER algorithm finds records that refer to the same real-world entity. Humans can often determine if two records refer to the same entity, and hence we study the problem of selecting questions to ask error-prone humans. We give a Maximum Likelihood formulation for the problem of finding the "most beneficial" questions to ask next. Our theoretical results lead to a lightweight and practical algorithm, bDENSE, for selecting questions to ask humans. Our experimental results show that bDENSE can more quickly reach an accurate outcome, compared to two approaches proposed recently. Moreover, through our experimental evaluation, we identify the strengths and weaknesses of all three approaches.
|Item Type:||Techreport (Technical Report)|
|Deposited By:||vasilis verroios|
|Deposited On:||24 Feb 2014 19:25|
|Last Modified:||03 Aug 2014 14:42|
Available Versions of this Item
- Entity Resolution with Crowd Errors. (deposited 24 Feb 2014 19:25) [Currently Displayed]
Repository Staff Only: item control page