Database Paper Browser

Back to papers

LOG-Means: Efficiently Estimating the Number of Clusters in Large Datasets

Summary: LOG-Means estimates the optimal number of clusters with sublinear dependence on the search space, enabling fast tuning on large datasets and Spark. In Apache Spark experiments, it outperforms 13 baselines in runtime and accuracy, delivering the most systematic large-space comparison to date. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
12103
Venue
VLDB
Year
2020
Pagerank
4.4039656e-05
Overall Rank
9,053 | 37.03%
DOI
10.14778/3407790.3407813

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 3 of 3 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 2 of 2 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
2,093 Scalable K-Means++ 2012 VLDB 9.5588104e-05
4,083 Solving k-center Clustering (with Outliers) in MapReduce and Streaming, almost as Accurately as Sequentially 2019 VLDB 6.4638932e-05
Previous Page 1 / 1 Next

Semantically Similar Papers