Stanford InfoLab Publication Server

Additional Experiments on Negative Rules

Whang, Steven Euijong and Benjelloun, Omar and Garcia-Molina, Hector (2008) Additional Experiments on Negative Rules. Technical Report. Stanford.




We implemented the General and Enhanced algorithms as described in our technical report "Generic Entity Resolution with Negative Rules" and conducted extensive experiments. We ran our experiments on a comparison shopping dataset provided by Yahoo!. In this application, hundreds of thousands of records arrive on a regular basis from different online stores and must be resolved before they are used to answer customer queries. Because of the volume of data, we used blocking techniques to partition the data into independent clusters and then applied our algorithms on each cluster. In our experiments, we used a partition containing records with the sub-string "iPod" in their titles; we will call these iPod-related records from now on. The algorithms were implemented in Java, and our experiments were run on a 1.8GHz AMD Opteron processor with 20.4GB of memory. Though our server had multiple processors, we did not exploit parallelism.

Item Type:Techreport (Technical Report)
Subjects:Computer Science > Data Integration and Mediation
Projects:Information Integration
Related URLs:Project Homepage
ID Code:852
Deposited By:Import Account
Deposited On:25 Mar 2008 17:00
Last Modified:10 Dec 2008 16:30

Download statistics

Repository Staff Only: item control page