Stanford InfoLab Publication Server

ArcSpread for Analyzing Web Archives

Soman, Siddhi and Arti Chharjta, Arti and Alexander, Bonomo and Andreas , Paepcke (2012) ArcSpread for Analyzing Web Archives. In: .

WarningThere is a more recent version of this item available.

PDF (ArcSpread spreadsheet metaphor for Web archive exploration.) - Submitted for Publication


We describe an architecture, partial implementation, and user study for ArcSpread. The vision for ArcSpread is to allow social scientists of the future, such as Historians, or Political Scientists, to analyze Web archives through a spreadsheet-like interface. Cells of these spreadsheets contain sets of objects, rather than single items. Examples for objects are Web page, Image, and Word. Formulas perform set operations on cell contents. When new content must be acquired, the formulas access an SQLite cache, or trigger operations on an underlying 60-core Hadoop cluster. This cluster, and the spreadsheet formulas have access to a multi-terabyte Web archive. When users double click on a loaded cell, a browser appropriate for the cell content type is raised. We present an envisioned example interaction, sketch the implemented Hadoop and spreadsheet level facilities, and describe the prototype the summarizer.

Item Type:Conference or Workshop Item (Paper)
Projects:Digital Libraries
ID Code:1037
Deposited By:Andreas Paepcke
Deposited On:13 Apr 2012 15:35
Last Modified:17 Apr 2012 08:32

Available Versions of this Item

Download statistics

Repository Staff Only: item control page