Tachibana, Makoto and Garcia-Molina, Hector (2009) Joint Entity Resolution. Technical Report. Stanford InfoLab.
Entity resolution (ER) is the process of matching records that represent the same real-world entity and then merging them. We consider the ER problem for two related datasets. In the datasets, a record in one can refer to a record in the other and an ER process running on one set can affect an ER process on the other. We formalize the joint ER model for datasets which reference each other by treating the match and merge functions as black boxes. We identify important properties for match and merge functions that, if satisfied, allow much more efficient ER.We provide four algorithms that run Entity Resolution for a pair of datasets. We show that our parallel algorithms require shorter runtime than naive alternate algorithms. We also introduce improvements for our parallel algorithms which result in fewer feature comparisons.
|Item Type:||Techreport (Technical Report)|
|Uncontrolled Keywords:||entity resolution|
|Related URLs:||Project Homepage||http://infolab.stanford.edu/serf/|
|Deposited By:||Makoto Tachibana|
|Deposited On:||08 Jan 2009 16:26|
|Last Modified:||13 Jan 2009 12:07|
Repository Staff Only: item control page