Soman, Siddhi and Arti Chharjta, Arti and Alexander, Bonomo and Andreas , Paepcke (2012) ArcSpread for Analyzing Web Archives. In: .
Abstract
We describe an architecture, partial implementation, and
user study for ArcSpread. The vision for ArcSpread is to allow
social scientists of the future, such as Historians, or Political
Scientists, to analyze Web archives through a spreadsheet-like
interface. Cells of these spreadsheets contain sets of objects,
rather than single items. Examples for objects are Web page,
Image, and Word. Formulas perform set operations on cell
contents. When new content must be acquired, the formulas access an
SQLite cache, or trigger operations on an underlying 60-core Hadoop
cluster. This cluster, and the spreadsheet formulas have access to a
multi-terabyte Web archive. When users double click on a loaded
cell, a browser appropriate for the cell content type is raised. We
present an envisioned example interaction, sketch the implemented
Hadoop and spreadsheet level facilities, and describe the prototype
the summarizer.
Item Type: | Conference or Workshop Item (Paper) |
---|
Projects: | Digital Libraries |
---|
ID Code: | 1037 |
---|
Deposited By: | Andreas Paepcke |
---|
Deposited On: | 13 Apr 2012 15:35 |
---|
Last Modified: | 17 Apr 2012 08:32 |
---|
Available Versions of this Item
- ArcSpread for Analyzing Web Archives. (deposited 13 Apr 2012 15:35) [Currently Displayed]
Download statistics
![](/irstats.cgi?page=last_month&set=eprint_1037)
![](/irstats.cgi?page=last_year&set=eprint_1037)
Repository Staff Only: item control page