Stanford InfoLab Publication Server

Entity Resolution and Tracking on Social Networks (PhD Thesis)

Vesdapunt, Norases Entity Resolution and Tracking on Social Networks (PhD Thesis). Technical Report. Stanford InfoLab.

BibTeXDublinCoreEndNoteHTML

[img]
Preview
PDF
12Mb

Abstract

In this thesis we study two interesting aspects of the problem of Entity Resolution (ER). The goal of ER is to identify and merge records that refer to the same underlying entity. The recent rise in adoption of social networks (Facebook, Google+, Twitter, and others) introduces new issues and twists to the traditional ER problem: crowdsourcing and limited information. We first study a hybrid human-machine approach to solving ER problems. Machine learning models can predict the probabilities of entity pairs referring to the same entity. However, machines make mistakes. Humans can help verify the equality of entity pairs, and social systems like Facebook allow users to help resolve entities on their platforms. We propose hybrid human-machine strategies with theoretical guarantees that leverage transitivity relations (e.g. a = c can be inferred given a = b and b = c). Next, we study the problem of ER with limited information. Social systems impose limits on API calls that constrain access to their full social graphs. We focus on the resolution of a single node g from one social graph G against a second social graph T. We want to find the best match for g in T, by dynamically probing T (using a public API), limited by the number of API calls that these social systems allow. We propose two ER strategies that are designed for limited information and can be adapted to different API limits. Finally, we study the problem of updating social graph snapshots when one has limited information. Effective social network ER requires up-to-date snapshots. Limited by the number of API calls that social systems allow, we seek to efficiently update a snapshot. We want to avoid re-crawling all of the nodes and minimize the number of API calls. We propose novel snapshot update strategies that are designed for limited information and can be adapted to different levels of staleness.

Item Type:Techreport (Technical Report)
ID Code:1144
Deposited By:Norases Vesdapunt
Deposited On:20 Aug 2016 17:03
Last Modified:20 Aug 2016 17:03

Download statistics

Repository Staff Only: item control page