Stanford InfoLab Publication Server

Patterns of Temporal Variation in Online Media

Yang, Jaewon and Leskovec, Jure Patterns of Temporal Variation in Online Media. In: ACM International Conference on Web Search and Data Minig (WSDM), 09-12 Feb 2011, Hong Kong, China.


PDF (Patterns of Temporal Variation in Online Media) - Accepted Version


Online content exhibits rich temporal dynamics, and diverse real-time user generated content further intensifies this process. However, temporal patterns by which online content grows and fades over time, and by which different pieces of content compete for attention remain largely unexplored. We study temporal patterns associated with online content and how the content's popularity grows and fades over time. The attention that content receives on the Web varies depending on many factors and occurs on very different time scales and at different resolutions. In order to uncover the temporal dynamics of online content we formulate a time series clustering problem using a similarity metric that is invariant to scaling and shifting. We develop the K-Spectral Centroid (K-SC) clustering algorithm that effectively finds cluster centroids with our similarity measure. By applying an adaptive wavelet-based incremental approach to clustering, we scale K-SC to large data sets. We demonstrate our approach on two massive datasets: a set of 580 million Tweets, and a set of 170 million blog posts and news media articles. We find that K-SC outperforms the K-means clustering algorithm in finding distinct shapes of time series. Our analysis shows that there are six main temporal shapes of attention of online content. We also present a simple model that reliably predicts the shape of attention by using information about only a small number of participants. Our analyses offer insight into common temporal patterns of the content on the Web and broaden the understanding of the dynamics of human attention.

Item Type:Conference or Workshop Item (Paper)
Uncontrolled Keywords:Information Diffusion, Online Media, Twitter, Memetracker, Time-Series, Clustering
ID Code:984
Deposited By:Jaewon Yang
Deposited On:27 Sep 2010 16:17
Last Modified:19 Nov 2010 14:09

Download statistics

Repository Staff Only: item control page