Database Paper Browser

Back to papers

V-SMART-Join: A Scalable MapReduce Framework for All-Pair Similarity Joins of Multisets and Vectors

Summary: V-SMART-Join is a scalable MapReduce framework for all-pair similarity joins over multisets, sets, and vectors, using a two-stage compute-and-filter pipeline to handle skew. It delivers up to 30x speedups over VCL and scales to large IP-cookie datasets for proxy detection. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
10513
Venue
VLDB
Year
2012
Pagerank
0.00010803271
Overall Rank
1,715 | 88.08%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 24 of 24 citing papers.

Rank Citing Paper Year Venue Pagerank
1,187 JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes 2019 SIGMOD 0.00013443639
1,931 Efficient Processing of k Nearest Neighbor Joins using MapReduce 2012 VLDB 0.00010040427
2,674 Minimal MapReduce Algorithms 2013 SIGMOD 8.3328645e-05
2,740 String Similarity Joins: An Experimental Evaluation 2014 VLDB 8.1980628e-05
3,129 Scalable Big Graph Processing in MapReduce 2014 SIGMOD 7.5008242e-05
3,141 ClusterJoin: A Similarity Joins Framework using Map-Reduce 2014 VLDB 7.4829448e-05
3,459 An Empirical Evaluation of Set Similarity Join Techniques 2016 VLDB 7.072508e-05
3,490 Leveraging Set Relations in Exact Set Similarity Join 2017 VLDB 7.0465856e-05
4,402 Smurf: Self-Service String Matching Using Random Forests 2019 VLDB 6.2195162e-05
4,775 Set Similarity Joins on MapReduce: An Experimental Survey 2018 VLDB 5.9315784e-05
5,434 Auto-FuzzyJoin: Auto-Program Fuzzy Similarity Joins Without Labeled Examples 2021 SIGMOD 5.5045402e-05
6,507 Similarity Join over Array Data 2016 SIGMOD 5.0337166e-05
7,588 Scalable Column Concept Determination for Web Tables Using Large Knowledge Bases 2013 VLDB 4.7030914e-05
7,668 Human-in-the-loop Data Integration 2017 VLDB 4.6834075e-05
8,137 Customizable and Scalable Fuzzy Join for Big Data 2019 VLDB 4.5774794e-05
8,899 Fast Approximate Similarity Join in Vector Databases 2025 SIGMOD 4.427232e-05
9,115 MapReduce Algorithms for Big Data Analysis 2012 VLDB 4.3932167e-05
9,832 Balance-Aware Distributed String Similarity-Based Query Processing System 2019 VLDB 4.2751057e-05
10,930 Similarity Joins of Sparse Features 2024 SIGMOD 4.1945683e-05
11,358 Scaling Equi-Joins 2022 SIGMOD 4.1945683e-05
11,504 LES3: Learning-based Exact Set Similarity Search 2021 VLDB 4.1945683e-05
11,724 ZigZag: Supporting Similarity Queries on Vector Space Models 2018 SIGMOD 4.1945683e-05
11,929 Processing of Probabilistic Skyline Queries Using MapReduce 2015 VLDB 4.1945683e-05
12,075 PLASMA-HD: Probing the LAttice Structure and MAkeup of High-dimensional Data 2013 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 6 of 6 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers

Overall Rank Paper Year Venue Pagerank
3,459 An Empirical Evaluation of Set Similarity Join Techniques 2016 VLDB 7.072508e-05
10,068 DiskJoin: Large-scale Vector Similarity Join with SSD 2026 SIGMOD 4.1945683e-05
3,490 Leveraging Set Relations in Exact Set Similarity Join 2017 VLDB 7.0465856e-05
250 Efficient set joins on similarity predicates 2004 SIGMOD 0.00030661988
8,899 Fast Approximate Similarity Join in Vector Databases 2025 SIGMOD 4.427232e-05
10,930 Similarity Joins of Sparse Features 2024 SIGMOD 4.1945683e-05
4,147 Exploiting MapReduce-based Similarity Joins 2012 SIGMOD 6.4096022e-05
447 Efficient Parallel Set-Similarity Joins Using MapReduce 2010 SIGMOD 0.00022900171
4,775 Set Similarity Joins on MapReduce: An Experimental Survey 2018 VLDB 5.9315784e-05
3,141 ClusterJoin: A Similarity Joins Framework using Map-Reduce 2014 VLDB 7.4829448e-05