Database Paper Browser

Back to papers

An Efficient Partition Based Method for Exact Set Similarity Joins

Summary: Partition-based exact set similarity joins; shared subset necessary for similarity, enabling stronger pruning than prefix filtering. Mixes 1-deletion neighborhoods; DP-guided greedy 2-approx with adaptive grouping yields O(s log s) per set and 2–5x speedups. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
11328
Venue
VLDB
Year
2016
Pagerank
6.4953612e-05
Overall Rank
4,050 | 71.83%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 21 of 21 citing papers.

Rank Citing Paper Year Venue Pagerank
2,175 Falcon: Scaling Up Hands-Off Crowdsourced Entity Matching to Build Cloud Services 2017 SIGMOD 9.3644117e-05
3,490 Leveraging Set Relations in Exact Set Similarity Join 2017 VLDB 7.0465856e-05
4,353 Overlap Set Similarity Joins with Theoretical Guarantees 2018 SIGMOD 6.263585e-05
4,775 Set Similarity Joins on MapReduce: An Experimental Survey 2018 VLDB 5.9315784e-05
5,365 Question Answering Over Knowledge Graphs: Question Understanding Via Template Decomposition 2018 VLDB 5.5461187e-05
5,469 Learned Cardinality Estimation for Similarity Queries 2021 SIGMOD 5.4898192e-05
6,074 Pigeonring: A Principle for Faster Thresholded Similarity Search 2019 VLDB 5.2242306e-05
6,605 Dima: A Distributed In-Memory Similarity-Based Query Processing System 2017 VLDB 4.9965703e-05
7,185 Certus: An Effective Entity Resolution Approach with Graph Differential Dependencies (GDDs) 2019 VLDB 4.8066159e-05
7,635 Allign: Aligning All-Pair Near-Duplicate Passages in Long Texts 2021 SIGMOD 4.6908858e-05
7,668 Human-in-the-loop Data Integration 2017 VLDB 4.6834075e-05
8,137 Customizable and Scalable Fuzzy Join for Big Data 2019 VLDB 4.5774794e-05
8,291 TxtAlign: Efficient Near-Duplicate Text Alignment Search via Bottom-k Sketches for Plagiarism Detection 2022 SIGMOD 4.5435639e-05
8,650 HAP: An Efficient Hamming Space Index Based on Augmented Pigeonhole Principle 2022 SIGMOD 4.4761716e-05
9,832 Balance-Aware Distributed String Similarity-Based Query Processing System 2019 VLDB 4.2751057e-05
9,876 Near-Duplicate Sequence Search at Scale for Large Language Model Memorization Evaluation 2023 SIGMOD 4.2667743e-05
10,706 Extensible and Robust Evaluation of Similarity Queries 2025 VLDB 4.1945683e-05
10,930 Similarity Joins of Sparse Features 2024 SIGMOD 4.1945683e-05
11,247 A Two-Level Signature Scheme for Stable Set Similarity Joins 2023 VLDB 4.1945683e-05
11,504 LES3: Learning-based Exact Set Similarity Search 2021 VLDB 4.1945683e-05
11,724 ZigZag: Supporting Similarity Queries on Vector Space Models 2018 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 10 of 10 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers