Stanford InfoLab Publication Server

Recovering Semantics of Tables on the Web

Venetis, Petros and Halevy, Alon and Madhavan, Jayant and Pasca, Marius and Shen, Warren and Wu, Fei and Miao, Gengxin and Wu, Chung (2011) Recovering Semantics of Tables on the Web. In: 37th International Conference on Very Large Data Bases (VLDB), Aug 29 -- Sep 3, Seattle, WA, USA.

BibTeXDublinCoreEndNoteHTML

This is the latest version of this item.

[img]
Preview
PDF - Updated Version
492Kb

Abstract

The Web offers a corpus of over 100 million tables [6], but the meaning of each table is rarely explicit from the table itself. Header rows exist in few cases and even when they do, the attribute names are typically useless. We describe a system that attempts to recover the semantics of tables by enriching the table with additional annotations. Our annotations facilitate operations such as searching for tables and finding related tables. To recover semantics of tables, we leverage a database of class labels and relationships automatically extracted from the Web. The database of classes and relationships has very wide coverage, but is also noisy. We attach a class label to a column if a sufficient number of the values in the column are identified with that label in the database of class labels, and analogously for binary relationships. We describe a formal model for reasoning about when we have seen sufficient evidence for a label, and show that it performs substantially better than a simple majority scheme. We describe a set of experiments that illustrate the utility of the recovered semantics for table search and show that it performs substantially better than previous approaches. In addition, we characterize what fraction of tables on the Web can be annotated using our approach.

Item Type:Conference or Workshop Item (Paper)
Related URLs:Author Homepagehttp://stanford.edu/~venetis/
ID Code:1012
Deposited By:Petros Venetis
Deposited On:29 Aug 2011 14:06
Last Modified:07 Nov 2011 14:15

Available Versions of this Item

Download statistics

Repository Staff Only: item control page