Stanford InfoLab Publication Server

ArcSpread for Analyzing Web Archives

Soman, Siddhi and Chharjta, Arti and Bonomo, Alexander and Paepcke, Andreas (2012) ArcSpread for Analyzing Web Archives. In: .


This is the latest version of this item.

PDF (ArcSpread spreadsheet metaphor for Web archive exploration.)


We describe an architecture, partial implementation, and user study for ArcSpread. The vision for ArcSpread is to allow social scientists of the future, such as Historians, or Political Scientists, to analyze Web archives through a spreadsheet-like interface. Cells of these spreadsheets contain sets of objects, rather than single items. Examples for objects are Web page, Image, and Word. Formulas perform set operations on cell contents. When new content must be acquired, the formulas access an SQLite cache, or trigger operations on an underlying 60-core Hadoop cluster. This cluster, and the spreadsheet formulas have access to a multi-terabyte Web archive. When users double click on a loaded cell, a browser appropriate for the cell content type is raised. We present an envisioned example interaction, sketch the implemented Hadoop and spreadsheet level facilities, and describe the prototype the summarizer.

Item Type:Conference or Workshop Item (Paper)
Projects:Digital Libraries
ID Code:1038
Deposited By:Andreas Paepcke
Deposited On:13 Apr 2012 15:40
Last Modified:17 Apr 2012 08:32

Available Versions of this Item

Download statistics

Repository Staff Only: item control page