Soman, Siddhi and Chharjta, Arti and Bonomo, Alexander and Paepcke, Andreas (2012) ArcSpread for Analyzing Web Archives. In: .
BibTeX | DublinCore | EndNote | HTML |
This is the latest version of this item.
| PDF (ArcSpread spreadsheet metaphor for Web archive exploration.) 1571Kb |
Abstract
We describe an architecture, partial implementation, and user study for ArcSpread. The vision for ArcSpread is to allow social scientists of the future, such as Historians, or Political Scientists, to analyze Web archives through a spreadsheet-like interface. Cells of these spreadsheets contain sets of objects, rather than single items. Examples for objects are Web page, Image, and Word. Formulas perform set operations on cell contents. When new content must be acquired, the formulas access an SQLite cache, or trigger operations on an underlying 60-core Hadoop cluster. This cluster, and the spreadsheet formulas have access to a multi-terabyte Web archive. When users double click on a loaded cell, a browser appropriate for the cell content type is raised. We present an envisioned example interaction, sketch the implemented Hadoop and spreadsheet level facilities, and describe the prototype the summarizer.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Projects: | Digital Libraries |
ID Code: | 1038 |
Deposited By: | Andreas Paepcke |
Deposited On: | 13 Apr 2012 15:40 |
Last Modified: | 17 Apr 2012 08:32 |
Available Versions of this Item
- ArcSpread for Analyzing Web Archives. (deposited 13 Apr 2012 15:35)
- ArcSpread for Analyzing Web Archives. (deposited 13 Apr 2012 15:40) [Currently Displayed]
Download statistics
Repository Staff Only: item control page