Identifying Users in Social Networks with Limited Information

Vesdapunt, Norases and Garcia-Molina, Hector (2014) Identifying Users in Social Networks with Limited Information. Technical Report. Stanford InfoLab.

WarningThere is a more recent version of this item available.

PDF - Draft Version


We study the problem of Entity Resolution (ER) with limited information. ER is the problem of identifying and merging records that represent the same real-world entity. In this paper, we focus on the resolution of a single node g from one social graph (Google+ in our case) against a second social graph (Twitter in our case). We want to find the best match for g in Twitter, by dynamically probing the Twitter graph (using a public API), limited by the number of API calls that social systems allow. We propose a strategy that is designed for limited information and is robust to rule changing. We evaluate our strategy against random on a real dataset and show that our strategy is 46% faster than random and achieves 88.3% accuracy. We also propose a greedy strategy, which is a natural extension of bipartite matching subject to limited API calls, and it achieves 85% accuracy.

