Soman, Siddhi and Arti Chharjta, Arti and Alexander, Bonomo and Andreas , Paepcke (2012) ArcSpread for Analyzing Web Archives. In: .
|PDF (ArcSpread spreadsheet metaphor for Web archive exploration.) - Submitted for Publication|
We describe an architecture, partial implementation, and user study for ArcSpread. The vision for ArcSpread is to allow social scientists of the future, such as Historians, or Political Scientists, to analyze Web archives through a spreadsheet-like interface. Cells of these spreadsheets contain sets of objects, rather than single items. Examples for objects are Web page, Image, and Word. Formulas perform set operations on cell contents. When new content must be acquired, the formulas access an SQLite cache, or trigger operations on an underlying 60-core Hadoop cluster. This cluster, and the spreadsheet formulas have access to a multi-terabyte Web archive. When users double click on a loaded cell, a browser appropriate for the cell content type is raised. We present an envisioned example interaction, sketch the implemented Hadoop and spreadsheet level facilities, and describe the prototype the summarizer.
|Item Type:||Conference or Workshop Item (Paper)|
|Deposited By:||Andreas Paepcke|
|Deposited On:||13 Apr 2012 15:35|
|Last Modified:||17 Apr 2012 08:32|
Available Versions of this Item
- ArcSpread for Analyzing Web Archives. (deposited 13 Apr 2012 15:35) [Currently Displayed]
Repository Staff Only: item control page