Scalable K-Means++
Summary: Introduces k-means||, a parallel init for k-means++ that reduces startup passes to a logarithmic number. Proves near-optimality after O(log k) rounds; in practice a constant number of passes suffices, with experiments showing k-means|| outperforms k-means++ in both sequential and parallel modes. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Bahman Bahmani
- 2. Benjamin Moseley
- 3. Andrea Vattani
- 4. Ravi Kumar
- 5. Sergei Vassilvitskii
Incoming Citations (Sorted by Pagerank)
Showing 13 of 13 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 4 of 4 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 33 | BIRCH: An Efficient Data Clustering Method for Very Large Databases | 1996 | SIGMOD | 0.00077324389 |
| 341 | CURE: An Efficient Clustering Algorithm for Large Databases | 1998 | SIGMOD | 0.00026810548 |
| 644 | Densest Subgraph in Streaming and MapReduce | 2012 | VLDB | 0.00018748988 |
| 886 | Fast Personalized PageRank on MapReduce | 2011 | SIGMOD | 0.00015597161 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 11,466 | Fast Density-Peaks Clustering: Multicore-based Parallelization Approach | 2021 | SIGMOD | 4.1945683e-05 |
| 5,417 | Theoretically-Efficient and Practical Parallel DBSCAN | 2020 | SIGMOD | 5.5194222e-05 |
| 11,852 | K-means Split Revisited: Well-grounded Approach and Experimental Evaluation | 2016 | SIGMOD | 4.1945683e-05 |
| 10,317 | Highly-Efficient Large-Scale k-means with Individual Fairness | 2026 | VLDB | 4.1945683e-05 |
| 10,971 | Settling Time vs. Accuracy Tradeoffs for Clustering Big Data | 2024 | SIGMOD | 4.1945683e-05 |
| 12,571 | k-Means Projective Clustering | 2004 | PODS | 4.1945683e-05 |
| 9,420 | Local Search Methods for k-Means with Outliers | 2017 | VLDB | 4.3441378e-05 |
| 4,083 | Solving k-center Clustering (with Outliers) in MapReduce and Streaming, almost as Accurately as Sequentially | 2019 | VLDB | 6.4638932e-05 |
| 4,652 | On the Efficiency of K-Means Clustering: Evaluation, Optimization, and Algorithm Selection | 2021 | VLDB | 6.0228549e-05 |
| 10,943 | Efficient Algorithm for K-Multiple-Means | 2024 | SIGMOD | 4.1945683e-05 |