Database Paper Browser

Back to papers

An Empirical Evaluation of Set Similarity Join Techniques

Summary: Empirical evaluation of seven set-similarity-join algorithms shows verification dominates runtime; a constant small number of elements often suffices. Three winners emerge; AllPairs remains relevant and prefix-filter is the key; the work replays experiments and analyzes runtime factors. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
11353
Venue
VLDB
Year
2016
Pagerank
7.072508e-05
Overall Rank
3,459 | 75.94%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 21 of 21 citing papers.

Rank Citing Paper Year Venue Pagerank
1,187 JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes 2019 SIGMOD 0.00013443639
2,641 Locality-Sensitive Hashing for Earthquake Detection: A Case Study of Scaling Data-Driven Science 2018 VLDB 8.3905374e-05
2,730 Open Data Integration 2018 VLDB 8.2126735e-05
3,490 Leveraging Set Relations in Exact Set Similarity Join 2017 VLDB 7.0465856e-05
4,278 Similarity Query Processing for High-Dimensional Data 2020 VLDB 6.2953764e-05
4,353 Overlap Set Similarity Joins with Theoretical Guarantees 2018 SIGMOD 6.263585e-05
4,402 Smurf: Self-Service String Matching Using Random Forests 2019 VLDB 6.2195162e-05
4,775 Set Similarity Joins on MapReduce: An Experimental Survey 2018 VLDB 5.9315784e-05
6,074 Pigeonring: A Principle for Faster Thresholded Similarity Search 2019 VLDB 5.2242306e-05
8,093 Scalable Distributed Inverted List Indexes in Disaggregated Memory 2024 SIGMOD 4.5873721e-05
8,511 JEDI: These aren't the JSON documents you're looking for... 2022 SIGMOD 4.495029e-05
10,245 SeDA: Bridging the Gap between Efficient Syntactic and Precise Semantic Search of Similar Passages in Large Text Corpora 2026 VLDB 4.1945683e-05
10,706 Extensible and Robust Evaluation of Similarity Queries 2025 VLDB 4.1945683e-05
10,930 Similarity Joins of Sparse Features 2024 SIGMOD 4.1945683e-05
10,967 Low-Latency Adaptive Distributed Stream Join System Based on a Flexible Join Model 2024 SIGMOD 4.1945683e-05
11,087 Dealing with Acronyms, Abbreviations, and Typos in Real-World Entity Matching 2024 VLDB 4.1945683e-05
11,185 FINEX: A Fast Index for Exact & Flexible Density-Based Clustering 2023 SIGMOD 4.1945683e-05
11,247 A Two-Level Signature Scheme for Stable Set Similarity Joins 2023 VLDB 4.1945683e-05
11,305 TokenJoin: Efficient Filtering for Set Similarity Join with Maximum Weighted Bipartite Matching 2023 VLDB 4.1945683e-05
11,504 LES3: Learning-based Exact Set Similarity Search 2021 VLDB 4.1945683e-05
11,724 ZigZag: Supporting Similarity Queries on Vector Space Models 2018 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 9 of 9 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers