Back to papers
Overlap Set Similarity Joins with Theoretical Guarantees
Summary: Overlap Set Similarity Joins with Theoretical Guarantees introduces a size-aware algorithm for c-overlap joins with time O(n^{2-1/c} k^{1/(2c)}). It partitions sets into small/large, uses large-set methods for the large group, and adds small-set heuristics plus a boundary optimizer, yielding strong practical speedups.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 5468
- Venue
- SIGMOD
- Year
- 2018
- Pagerank
- 6.263585e-05
- Overall Rank
- 4,353 | 69.72%
- DOI
-
10.1145/3183713.3183748
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 16 of 16 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 4,402 |
Smurf: Self-Service String Matching Using Random Forests |
2019 |
VLDB |
6.2195162e-05 |
| 5,469 |
Learned Cardinality Estimation for Similarity Queries |
2021 |
SIGMOD |
5.4898192e-05 |
| 6,074 |
Pigeonring: A Principle for Faster Thresholded Similarity Search |
2019 |
VLDB |
5.2242306e-05 |
| 6,647 |
Fast Join Project Query Evaluation using Matrix Multiplication |
2020 |
SIGMOD |
4.9772122e-05 |
| 7,635 |
Allign: Aligning All-Pair Near-Duplicate Passages in Long Texts |
2021 |
SIGMOD |
4.6908858e-05 |
| 7,765 |
Cache-oblivious High-performance Similarity Join |
2019 |
SIGMOD |
4.6572085e-05 |
| 8,291 |
TxtAlign: Efficient Near-Duplicate Text Alignment Search via Bottom-k Sketches for Plagiarism Detection |
2022 |
SIGMOD |
4.5435639e-05 |
| 8,910 |
R2D2: Reducing Redundancy and Duplication in Data Lakes |
2023 |
SIGMOD |
4.427232e-05 |
| 8,966 |
Output-sensitive Conjunctive Query Evaluation |
2024 |
PODS |
4.4193184e-05 |
| 9,832 |
Balance-Aware Distributed String Similarity-Based Query Processing System |
2019 |
VLDB |
4.2751057e-05 |
| 9,876 |
Near-Duplicate Sequence Search at Scale for Large Language Model Memorization Evaluation |
2023 |
SIGMOD |
4.2667743e-05 |
| 10,245 |
SeDA: Bridging the Gap between Efficient Syntactic and Precise Semantic Search of Similar Passages in Large Text Corpora |
2026 |
VLDB |
4.1945683e-05 |
| 10,706 |
Extensible and Robust Evaluation of Similarity Queries |
2025 |
VLDB |
4.1945683e-05 |
| 10,951 |
Determining the Largest Overlap between Tables |
2024 |
SIGMOD |
4.1945683e-05 |
| 11,247 |
A Two-Level Signature Scheme for Stable Set Similarity Joins |
2023 |
VLDB |
4.1945683e-05 |
| 11,504 |
LES3: Learning-based Exact Set Similarity Search |
2021 |
VLDB |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 15 of 15 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 34 |
Similarity Search in High Dimensions via Hashing |
1999 |
VLDB |
0.00076637636 |
| 250 |
Efficient set joins on similarity predicates |
2004 |
SIGMOD |
0.00030661988 |
| 266 |
Efficient Exact Set-Similarity Joins |
2006 |
VLDB |
0.00029718727 |
| 1,048 |
Set Containment Joins: The Good, The Bad and The Ugly |
2000 |
VLDB |
0.00014457009 |
| 1,305 |
Bayesian Locality Sensitive Hashing for Fast Similarity Search |
2012 |
VLDB |
0.00012687101 |
| 1,396 |
Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search |
2012 |
SIGMOD |
0.00012204748 |
| 2,592 |
Pass-Join: A Partition-based Method for Similarity Joins |
2012 |
VLDB |
8.4795761e-05 |
| 2,740 |
String Similarity Joins: An Experimental Evaluation |
2014 |
VLDB |
8.1980628e-05 |
| 3,459 |
An Empirical Evaluation of Set Similarity Join Techniques |
2016 |
VLDB |
7.072508e-05 |
| 3,490 |
Leveraging Set Relations in Exact Set Similarity Join |
2017 |
VLDB |
7.0465856e-05 |
| 3,514 |
Spatio-Textual Similarity Joins |
2013 |
VLDB |
7.0226998e-05 |
| 4,050 |
An Efficient Partition Based Method for Exact Set Similarity Joins |
2016 |
VLDB |
6.4953612e-05 |
| 4,401 |
LEMP: Fast Retrieval of Large Entries in a Matrix Product |
2015 |
SIGMOD |
6.2211271e-05 |
| 4,808 |
On the Complexity of Inner Product Similarity Join |
2016 |
PODS |
5.908896e-05 |
| 6,726 |
A Pivotal Prefix Based Filtering Algorithm for String Similarity Search |
2014 |
SIGMOD |
4.9484027e-05 |
Semantically Similar Papers