Database Paper Browser

Back to papers

Efficient Exact Set-Similarity Joins

Summary: Exact set-similarity join (SSJoin) algorithms for cross-collection sets. First to achieve both exact results and deterministic performance guarantees, surpassing prior probabilistic-guarantee methods; validated on real and synthetic datasets. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
9504
Venue
VLDB
Year
2006
Pagerank
0.00029718727
Overall Rank
266 | 98.16%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 28 of 78 citing papers.

Rank Citing Paper Year Venue Pagerank
6,726 A Pivotal Prefix Based Filtering Algorithm for String Similarity Search 2014 SIGMOD 4.9484027e-05
7,185 Certus: An Effective Entity Resolution Approach with Graph Differential Dependencies (GDDs) 2019 VLDB 4.8066159e-05
7,215 SyncSignature: A Simple, Efficient, Parallelizable Framework for Tree Similarity Joins 2023 VLDB 4.7985991e-05
7,588 Scalable Column Concept Determination for Web Tables Using Large Knowledge Bases 2013 VLDB 4.7030914e-05
7,668 Human-in-the-loop Data Integration 2017 VLDB 4.6834075e-05
7,669 Incorporating String Transformations in Record Matching 2008 SIGMOD 4.6833751e-05
7,847 Set Similarity Join on Probabilistic Data 2010 VLDB 4.6365272e-05
8,137 Customizable and Scalable Fuzzy Join for Big Data 2019 VLDB 4.5774794e-05
8,291 TxtAlign: Efficient Near-Duplicate Text Alignment Search via Bottom-k Sketches for Plagiarism Detection 2022 SIGMOD 4.5435639e-05
8,575 THERMAL-JOIN: A Scalable Spatial Join for Dynamic Workloads 2015 SIGMOD 4.4928872e-05
8,618 Nexus: Correlation Discovery over Collections of Spatio-Temporal Tabular Data 2024 SIGMOD 4.4838259e-05
8,932 Comparative evaluation of entity resolution approaches with FEVER 2009 VLDB 4.427232e-05
9,439 On-the-Fly Token Similarity Joins in Relational Databases 2014 SIGMOD 4.3423824e-05
9,502 Streaming Similarity Self-Join 2016 VLDB 4.3341665e-05
9,832 Balance-Aware Distributed String Similarity-Based Query Processing System 2019 VLDB 4.2751057e-05
9,850 COMPARE: Accelerating Groupwise Comparison in Relational Databases for Data Analytics 2021 VLDB 4.2721228e-05
9,932 Local Filtering: Improving the Performance of Approximate Queries on String Collections 2015 SIGMOD 4.2500258e-05
9,933 Efficient and Effective KNN Sequence Search with Approximate n-grams 2014 VLDB 4.2500258e-05
10,068 DiskJoin: Large-scale Vector Similarity Join with SSD 2026 SIGMOD 4.1945683e-05
10,706 Extensible and Robust Evaluation of Similarity Queries 2025 VLDB 4.1945683e-05
10,930 Similarity Joins of Sparse Features 2024 SIGMOD 4.1945683e-05
11,087 Dealing with Acronyms, Abbreviations, and Typos in Real-World Entity Matching 2024 VLDB 4.1945683e-05
11,175 Grouping Time Series for Efficient Columnar Storage 2023 SIGMOD 4.1945683e-05
11,247 A Two-Level Signature Scheme for Stable Set Similarity Joins 2023 VLDB 4.1945683e-05
11,305 TokenJoin: Efficient Filtering for Set Similarity Join with Maximum Weighted Bipartite Matching 2023 VLDB 4.1945683e-05
11,504 LES3: Learning-based Exact Set Similarity Search 2021 VLDB 4.1945683e-05
11,724 ZigZag: Supporting Similarity Queries on Vector Space Models 2018 SIGMOD 4.1945683e-05
11,979 Similarity Joins for Uncertain Strings 2014 SIGMOD 4.1945683e-05
Previous Page 2 / 2 Next

Outgoing Citations (Sorted by Pagerank)

Showing 12 of 12 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers