ClusterJoin: A Similarity Joins Framework using Map-Reduce
Summary: ClusterJoin: a MapReduce framework for scalable similarity joins that partitions data by distribution and routes records to relevant partitions. Bisector-based candidate filters with sampling-driven load balancing deliver probabilistic guarantees and robust scalability for high-dimensional, low-threshold data. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Akash Das Sarma
- 2. Yeye He
- 3. Surajit Chaudhuri
Incoming Citations (Sorted by Pagerank)
Showing 21 of 21 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 7 of 7 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 22 | SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets | 2008 | VLDB | 0.0008456613 |
| 266 | Efficient Exact Set-Similarity Joins | 2006 | VLDB | 0.00029718727 |
| 447 | Efficient Parallel Set-Similarity Joins Using MapReduce | 2010 | SIGMOD | 0.00022900171 |
| 1,074 | Processing Theta-Joins using MapReduce* | 2011 | SIGMOD | 0.00014260096 |
| 1,305 | Bayesian Locality Sensitive Hashing for Fast Similarity Search | 2012 | VLDB | 0.00012687101 |
| 1,715 | V-SMART-Join: A Scalable MapReduce Framework for All-Pair Similarity Joins of Multisets and Vectors | 2012 | VLDB | 0.00010803271 |
| 4,147 | Exploiting MapReduce-based Similarity Joins | 2012 | SIGMOD | 6.4096022e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 11,890 | Let's Rethink Join Optimization in Distributed Systems | 2015 | CIDR | 4.1945683e-05 |
| 10,068 | DiskJoin: Large-scale Vector Similarity Join with SSD | 2026 | SIGMOD | 4.1945683e-05 |
| 15 | Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters | 2007 | SIGMOD | 0.0010654262 |
| 6,507 | Similarity Join over Array Data | 2016 | SIGMOD | 5.0337166e-05 |
| 960 | A Comparison of Join Algorithms for Log Processing in MapReduce | 2010 | SIGMOD | 0.00015012242 |
| 10,930 | Similarity Joins of Sparse Features | 2024 | SIGMOD | 4.1945683e-05 |
| 1,931 | Efficient Processing of k Nearest Neighbor Joins using MapReduce | 2012 | VLDB | 0.00010040427 |
| 4,775 | Set Similarity Joins on MapReduce: An Experimental Survey | 2018 | VLDB | 5.9315784e-05 |
| 447 | Efficient Parallel Set-Similarity Joins Using MapReduce | 2010 | SIGMOD | 0.00022900171 |
| 4,147 | Exploiting MapReduce-based Similarity Joins | 2012 | SIGMOD | 6.4096022e-05 |