Whang, Steven Euijong and Garcia-Molina, Hector (2010) Entity Resolution with Evolving Rules. In: PVLDB, September 13-17, 2010, Singapore.
BibTeX | DublinCore | EndNote | HTML |
This is the latest version of this item.
| PDF - Published Version 276Kb |
Abstract
Entity resolution (ER) identifies database records that refer to the same real world entity. In practice, ER is not a one-time process, but is constantly improved as the data, schema and application are better understood. We address the problem of keeping the ER result up-to-date when the ER logic ``evolves'' frequently. A na\"\i ve approach that re-runs ER from scratch may not be tolerable for resolving large datasets. This paper investigates when and how we can instead exploit previous ``materialized'' ER results to save redundant work with evolved logic. We introduce algorithm properties that facilitate evolution, and we propose efficient rule evolution techniques for two clustering ER models: match-based clustering and distance-based clustering. Using real data sets, we illustrate the cost of materializations and the potential gains over the na\"\i ve approach.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Projects: | SERF |
ID Code: | 974 |
Deposited By: | Steven Whang |
Deposited On: | 02 Jul 2010 10:43 |
Last Modified: | 08 Jul 2010 00:54 |
Available Versions of this Item
- Entity Resolution with Evolving Rules. (deposited 08 Mar 2010 08:24)
- Entity Resolution with Evolving Rules. (deposited 02 Jul 2010 10:43) [Currently Displayed]
Download statistics
Repository Staff Only: item control page