Srivastava, Utkarsh and Widom, Jennifer (2004) Memory-Limited Execution of Windowed Stream Joins. Technical Report. Stanford InfoLab.
We address the problem of computing approximate answers to continuous sliding-window joins over data streams when the available memory may be insufficient to keep the entire join state. One approximation scenario is to provide a maximum subset of the result, with the objective of losing as few result tuples as possible. An alternative scenario is to provide a random sample of the join result, e.g., if the output of the join is being aggregated. We show formally that neither approximation can be addressed effectively for a sliding-window join of arbitrary input streams. Previous work has addressed only the maximum-subset problem, and has implicitly used a frequency-based model of stream arrival. We address the sampling problem for this model. More importantly, we point out a broad class of applications for which an age-based model of stream arrival is more appropriate, and we address both approximation scenarios under this new model. Finally, for the case of multiple joins being executed with an overall memory constraint, we provide an algorithm for memory allocation across the joins that optimizes a combined measure of approximation in all scenarios considered. All of our algorithms are implemented and experimental results demonstrate their effectiveness.
|Item Type:||Techreport (Technical Report)|
|Uncontrolled Keywords:||streams, joins, approximation, memory-limited, load-shedding|
|Subjects:||Computer Science > Data Streams|
|Related URLs:||Project Homepage||http://infolab.stanford.edu/stream/|
|Deposited By:||Import Account|
|Deposited On:||08 Mar 2004 16:00|
|Last Modified:||23 Dec 2008 09:50|
Repository Staff Only: item control page