Stanford InfoLab Publication Server

Distributed Top-K Monitoring

Babcock, Brian and Olston, Chris (2003) Distributed Top-K Monitoring. In: ACM International Conference on Management of Data (SIGMOD 2003), June 9-12, 2003, San Diego, California.

BibTeXDublinCoreEndNoteHTML

This is the latest version of this item.

[img]
Preview
PDF
428Kb

Abstract

The querying and analysis of data streams has been a topic of much recent interest, motivated by applications from the fields of networking, web usage analysis, sensor instrumentation, telecommunications, and others. Many of these applications involve monitoring answers to continuous queries over data streams produced at physically distributed locations, and most previous approaches require streams to be transmitted to a single location for centralized processing. Unfortunately, the continual transmission of a large number of rapid data streams to a central location can be impractical or expensive. We study a useful class of queries that continuously report the k largest values obtained from distributed data streams ("top-k monitoring queries"), which are of particular interest because they can be used to reduce the overhead incurred while running other types of monitoring queries. We show that transmitting entire data streams is unnecessary to support these queries and present an alternative approach that reduces communication significantly. In our approach, arithmetic constraints are maintained at remote stream sources to ensure that the most recently provided top-k answer remains valid to within a user-specified error tolerance. Distributed communication is only necessary on occasion, when constraints are violated, and we show empirically through extensive simulation on real-world data that our approach reduces overall communication cost by an order of magnitude compared with alternatives that offer the same error guarantees.

Item Type:Conference or Workshop Item (Paper)
Uncontrolled Keywords:top-k, online monitoring, continuous queries, data streams
Subjects:Computer Science > Data Streams
Computer Science > Distributed Systems
Projects:STREAM
TRAPP
Related URLs:Project Homepage, Project Homepagehttp://infolab.stanford.edu/stream/, http://infolab.stanford.edu/trapp/
ID Code:748
Deposited By:Import Account
Deposited On:27 Feb 2003 16:00
Last Modified:24 Dec 2008 08:37

Available Versions of this Item

Download statistics

Repository Staff Only: item control page