Stanford InfoLab Publication Server

Profiler: Integrated Statistical Analysis and Visualization for Data Quality Assessment

Kandel, Sean and Parikh, Ravi and Paepcke, Andreas and Hellerstein, Joseph M. and Heer, Jeffrey (2012) Profiler: Integrated Statistical Analysis and Visualization for Data Quality Assessment. In: AVI '12.

BibTeXDublinCoreEndNoteHTML

[img]
Preview
PDF - Accepted Version
1335Kb

Abstract

Data quality issues such as missing, erroneous, extreme and duplicate values undermine analysis and are time-consuming to find and fix. Automated methods can help identify anomalies, but determining what constitutes an error is context-dependent and so requires human judgment. While visualization tools can facilitate this process, analysts must often manually construct the necessary views, requiring significant expertise. We present Profiler, a visual analysis tool for assessing quality issues in tabular data. Profiler applies data mining methods to automatically flag problematic data and suggests coordinated summary visualizations for assessing the data in context. The system contributes novel methods for integrated statistical and visual analysis, automatic view suggestion, and scal- able visual summaries that support real-time interaction with millions of data points. We present Profiler’s architecture including modular components for custom data types, anomaly detection routines and summary visualizations — and describe its application to motion picture, natural disaster and water quality data sets.

Item Type:Conference or Workshop Item (Paper)
Projects:Miscellaneous
ID Code:1040
Deposited By:Sean Kandel
Deposited On:03 Jun 2013 08:48
Last Modified:03 Jun 2013 08:48

Download statistics

Repository Staff Only: item control page