Settling Time vs. Accuracy Tradeoffs for Clustering Big Data
Summary: Settles the runtime/accuracy frontier for big-data k-means/k-median: shows sensitivity-sampling coresets can be built in near-linear time, refuting the folklore superlinear barrier. Then benchmarks sampling/coreset heuristics in batch and streaming to characterize when exact-ish summaries are worth the cost vs. crude subsampling. (summarized by gpt-5.4-mini on May 24 2026)
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Authors
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 1 of 1 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 33 | BIRCH: An Efficient Data Clustering Method for Very Large Databases | 1996 | SIGMOD | 0.00077324389 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 10,927 | Computing A Well-Representative Summary of Conjunctive Query Results | 2024 | PODS | 4.1945683e-05 |
| 2,093 | Scalable K-Means++ | 2012 | VLDB | 9.5588104e-05 |
| 4,652 | On the Efficiency of K-Means Clustering: Evaluation, Optimization, and Algorithm Selection | 2021 | VLDB | 6.0228549e-05 |
| 1,595 | Fast Algorithms for Projected Clustering | 1999 | SIGMOD | 0.00011222442 |
| 10,923 | k-Clustering with Comparison and Distance Oracles | 2024 | PODS | 4.1945683e-05 |
| 1,860 | Approximation Algorithms for Clustering Uncertain Data | 2008 | PODS | 0.0001028857 |
| 7,480 | Towards Metric DBSCAN: Exact, Approximate, and Streaming Algorithms | 2024 | SIGMOD | 4.7180617e-05 |
| 3,313 | Quality and Efficiency in Kernel Density Estimates for Large Data | 2013 | SIGMOD | 7.2381634e-05 |
| 10,924 | Improved Approximation Algorithms for Relational Clustering | 2024 | PODS | 4.1945683e-05 |
| 4,083 | Solving k-center Clustering (with Outliers) in MapReduce and Streaming, almost as Accurately as Sequentially | 2019 | VLDB | 6.4638932e-05 |