Whang, Steven Euijong and Menestrina, David and Koutrika, Georgia and Theobald, Martin and Garcia-Molina, Hector (2009) Entity Resolution with Iterative Blocking. In: SIGMOD 2009, June 29 - July 2, 2009, Providence, Rhode Island.
BibTeX | DublinCore | EndNote | HTML |
This is the latest version of this item.
| PDF - Published Version 241Kb |
Abstract
Entity Resolution (ER) is the problem of identifying which records in a database refer to the same real-world entity. An exhaustive ER process involves computing the similarities between pairs of records, which can be very expensive for large datasets. Various blocking techniques can be used to enhance the performance of ER by dividing the records into blocks in multiple ways and only comparing records within the same block. However, most blocking techniques process blocks separately and do not exploit the results of other blocks. In this paper, we propose an {\em iterative blocking framework} where the ER results of blocks are reflected to subsequently processed blocks. Blocks are now iteratively processed until no block contains any more matching records. Compared to simple blocking, iterative blocking may achieve higher accuracy because reflecting the ER results of blocks to other blocks may generate additional record matches. Iterative blocking may also be more efficient because processing a block now saves the processing time for other blocks. We implement a scalable iterative blocking system and demonstrate that iterative blocking is more accurate and efficient than blocking, especially for large datasets.
Item Type: | Conference or Workshop Item (Paper) | |
---|---|---|
Uncontrolled Keywords: | entity resolution, blocking, iterative blocking | |
Subjects: | Computer Science > Data Mining | |
Projects: | SERF | |
Related URLs: | Project Homepage | http://infolab.stanford.edu/serf/ |
ID Code: | 915 | |
Deposited By: | Steven Whang | |
Deposited On: | 09 Apr 2009 00:12 | |
Last Modified: | 22 Apr 2009 22:14 |
Available Versions of this Item
- Entity Resolution with Iterative Blocking. (deposited 21 Jun 2008 17:00)
- Entity Resolution with Iterative Blocking. (deposited 09 Apr 2009 00:12) [Currently Displayed]
Download statistics
Repository Staff Only: item control page