Stanford InfoLab Publication Server

k-means++: The Advantages of Careful Seeding

Arthur, David and Vassilvitskii, Sergei (2006) k-means++: The Advantages of Careful Seeding. Technical Report. Stanford.




The k-means method is a widely used clustering technique that seeks to minimize the average squared distance between points in the same cluster. Although it offers no accuracy guarantees, its simplicity and speed are very appealing in practice. By augmenting k-means with a simple, randomized seeding technique, we obtain an algorithm that is $O(\log k)$-competitive with the optimal clustering. Experiments show our augmentation improves both the speed and the accuracy of k-means, often quite dramatically.

Item Type:Techreport (Technical Report)
Uncontrolled Keywords:k-means, clustering, seeding
Subjects:Computer Science > Data Mining
ID Code:778
Deposited By:Import Account
Deposited On:06 Jun 2006 17:00
Last Modified:18 Dec 2008 14:38

Download statistics

Repository Staff Only: item control page