Finding Near Neighbors Through Cluster Pruning
Summary: Cluster pruning: pick random leaders, partition points by nearest leader, and search only the clusters of the closest leader(s) (with optional recursion) to obtain approximate nearest neighbors. Formalizes cost–accuracy tradeoffs, gives provable guarantees under generalized Gaussian-mixtures, and demonstrates orders-of-magnitude query speedups with modest quality loss, outperforming p-spheres in RAM and external-memory settings. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
Incoming Citations (Sorted by Pagerank)
Showing 2 of 2 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 936 | Framework for Evaluating Clustering Algorithms in Duplicate Detection | 2009 | VLDB | 0.0001521549 |
| 2,024 | ATLAS: A Probabilistic Algorithm for High Dimensional Similarity Search | 2011 | SIGMOD | 9.7519678e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 8 of 8 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 2 | R-Trees: A Dynamic Index Structure For Spatial Searching | 1984 | SIGMOD | 0.0032169493 |
| 34 | Similarity Search in High Dimensions via Hashing | 1999 | VLDB | 0.00076637636 |
| 129 | The X-tree: An Index Structure for High-Dimensional Data | 1996 | VLDB | 0.0004429571 |
| 284 | The SR-tree: An Index Structure for High-Dimensional Nearest Neighbor Queries | 1997 | SIGMOD | 0.00028994728 |
| 575 | Distance-Based Indexing For High-Dimensional Metric Spaces | 1997 | SIGMOD | 0.00019882723 |
| 708 | Near Neighbor Search in Large Metric Spaces | 1995 | VLDB | 0.00017772684 |
| 709 | Efficient Similarity Search and Classification via Rank Aggregation | 2003 | SIGMOD | 0.00017768547 |
| 3,061 | Contrast Plots and P-Sphere Trees: Space vs. Time in Nearest Neighbor Searches | 2000 | VLDB | 7.6382127e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3,555 | Fast Parallel Similarity Search in Multimedia Databases | 1997 | SIGMOD | 6.9772546e-05 |
| 11,595 | Minimization of Classifier Construction Cost for Search Queries | 2020 | SIGMOD | 4.1945683e-05 |
| 1,931 | Efficient Processing of k Nearest Neighbor Joins using MapReduce | 2012 | VLDB | 0.00010040427 |
| 4,724 | Nearest-Neighbor Searching Under Uncertainty | 2012 | PODS | 5.9697823e-05 |
| 1,183 | A Cost Model For Nearest Neighbor Search in High-Dimensional Data Space | 1997 | PODS | 0.00013455596 |
| 7,084 | Nearest Neighbor Searching Under Uncertainty II | 2013 | PODS | 4.839879e-05 |
| 1,542 | Efficient Search for the Top-k Probable Nearest Neighbors in Uncertain Databases | 2008 | VLDB | 0.00011456321 |
| 10,165 | Fast-Convergent Proximity Graphs for Approximate Nearest Neighbor Search | 2026 | SIGMOD | 4.1945683e-05 |
| 10,073 | Efficient Approximate Nearest Neighbor Search via Hemi-Sphere Centroids Graph | 2026 | SIGMOD | 4.1945683e-05 |
| 709 | Efficient Similarity Search and Classification via Rank Aggregation | 2003 | SIGMOD | 0.00017768547 |