V-SMART-Join: A Scalable MapReduce Framework for All-Pair Similarity Joins of Multisets and Vectors
Summary: V-SMART-Join is a scalable MapReduce framework for all-pair similarity joins over multisets, sets, and vectors, using a two-stage compute-and-filter pipeline to handle skew. It delivers up to 30x speedups over VCL and scales to large IP-cookie datasets for proxy detection. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
Incoming Citations (Sorted by Pagerank)
Showing 24 of 24 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 6 of 6 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3 | Pig Latin: A Not-So-Foreign Language for Data Processing | 2008 | SIGMOD | 0.0024183614 |
| 70 | Hive - A Warehousing Solution Over a Map-Reduce Framework | 2009 | VLDB | 0.00059533166 |
| 250 | Efficient set joins on similarity predicates | 2004 | SIGMOD | 0.00030661988 |
| 266 | Efficient Exact Set-Similarity Joins | 2006 | VLDB | 0.00029718727 |
| 447 | Efficient Parallel Set-Similarity Joins Using MapReduce | 2010 | SIGMOD | 0.00022900171 |
| 6,362 | SLEUTH: Single-publisher attack detection Using correlation Hunting | 2008 | VLDB | 5.0953013e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3,459 | An Empirical Evaluation of Set Similarity Join Techniques | 2016 | VLDB | 7.072508e-05 |
| 10,068 | DiskJoin: Large-scale Vector Similarity Join with SSD | 2026 | SIGMOD | 4.1945683e-05 |
| 3,490 | Leveraging Set Relations in Exact Set Similarity Join | 2017 | VLDB | 7.0465856e-05 |
| 250 | Efficient set joins on similarity predicates | 2004 | SIGMOD | 0.00030661988 |
| 8,899 | Fast Approximate Similarity Join in Vector Databases | 2025 | SIGMOD | 4.427232e-05 |
| 10,930 | Similarity Joins of Sparse Features | 2024 | SIGMOD | 4.1945683e-05 |
| 4,147 | Exploiting MapReduce-based Similarity Joins | 2012 | SIGMOD | 6.4096022e-05 |
| 447 | Efficient Parallel Set-Similarity Joins Using MapReduce | 2010 | SIGMOD | 0.00022900171 |
| 4,775 | Set Similarity Joins on MapReduce: An Experimental Survey | 2018 | VLDB | 5.9315784e-05 |
| 3,141 | ClusterJoin: A Similarity Joins Framework using Map-Reduce | 2014 | VLDB | 7.4829448e-05 |