LOG-Means: Efficiently Estimating the Number of Clusters in Large Datasets
Summary: LOG-Means estimates the optimal number of clusters with sublinear dependence on the search space, enabling fast tuning on large datasets and Spark. In Apache Spark experiments, it outperforms 13 baselines in runtime and accuracy, delivering the most systematic large-space comparison to date. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Manuel Fritz
- 2. Michael Behringer
- 3. Holger Schwarz
Incoming Citations (Sorted by Pagerank)
Showing 3 of 3 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 10,716 | Federated and Balanced Clustering for High-dimensional Data | 2025 | VLDB | 4.1945683e-05 |
| 11,045 | Ensemble Clustering based on Meta-Learning and Hyperparameter Optimization | 2024 | VLDB | 4.1945683e-05 |
| 13,184 | ML2DAC: Meta-learning to Democratize AutoML for Clustering Analyses | 2023 | SIGMOD | - |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 2 of 2 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 2,093 | Scalable K-Means++ | 2012 | VLDB | 9.5588104e-05 |
| 4,083 | Solving k-center Clustering (with Outliers) in MapReduce and Streaming, almost as Accurately as Sequentially | 2019 | VLDB | 6.4638932e-05 |
Previous
Page 1 / 1
Next