Whang, Steven Euijong and Garcia-Molina, Hector (2010) Entity Resolution with Evolving Rules. In: PVLDB, September 13-17, 2010, Singapore.
This is the latest version of this item.
|PDF - Published Version|
Entity resolution (ER) identifies database records that refer to the same real world entity. In practice, ER is not a one-time process, but is constantly improved as the data, schema and application are better understood. We address the problem of keeping the ER result up-to-date when the ER logic ``evolves'' frequently. A na\"\i ve approach that re-runs ER from scratch may not be tolerable for resolving large datasets. This paper investigates when and how we can instead exploit previous ``materialized'' ER results to save redundant work with evolved logic. We introduce algorithm properties that facilitate evolution, and we propose efficient rule evolution techniques for two clustering ER models: match-based clustering and distance-based clustering. Using real data sets, we illustrate the cost of materializations and the potential gains over the na\"\i ve approach.
|Item Type:||Conference or Workshop Item (Paper)|
|Deposited By:||Steven Whang|
|Deposited On:||02 Jul 2010 10:43|
|Last Modified:||08 Jul 2010 00:54|
Available Versions of this Item
- Entity Resolution with Evolving Rules. (deposited 08 Mar 2010 08:24)
- Entity Resolution with Evolving Rules. (deposited 02 Jul 2010 10:43) [Currently Displayed]
Repository Staff Only: item control page