Database Paper Browser

Back to papers

Pass-Join: A Partition-based Method for Similarity Joins

Summary: Pass-Join is a partition-based method for edit-distance similarity joins, segmenting strings and building inverted indices on segments. It adaptively selects substrings to minimize candidates and uses novel pruning to efficiently verify pairs, achieving state-of-the-art performance on real data for both short and long strings. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
10475
Venue
VLDB
Year
2012
Pagerank
8.4795761e-05
Overall Rank
2,592 | 81.97%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 38 of 38 citing papers.

Rank Citing Paper Year Venue Pagerank
1,187 JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes 2019 SIGMOD 0.00013443639
1,396 Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search 2012 SIGMOD 0.00012204748
2,435 iDEC: Indexable Distance Estimating Codes for Approximate Nearest Neighbor Search 2020 VLDB 8.8252237e-05
2,740 String Similarity Joins: An Experimental Evaluation 2014 VLDB 8.1980628e-05
4,050 An Efficient Partition Based Method for Exact Set Similarity Joins 2016 VLDB 6.4953612e-05
4,250 Local Similarity Search for Unstructured Text 2016 SIGMOD 6.3241139e-05
4,353 Overlap Set Similarity Joins with Theoretical Guarantees 2018 SIGMOD 6.263585e-05
4,684 Approximate String Joins with Abbreviations 2018 VLDB 6.0006406e-05
4,775 Set Similarity Joins on MapReduce: An Experimental Survey 2018 VLDB 5.9315784e-05
4,808 On the Complexity of Inner Product Similarity Join 2016 PODS 5.908896e-05
5,151 String Similarity Measures and Joins with Synonyms 2013 SIGMOD 5.6609851e-05
5,179 SilkMoth: An Efficient Method for Finding Related Sets with Maximum Matching Constraints 2017 VLDB 5.6428428e-05
5,232 SEAL: Spatio-Textual Similarity Search 2012 VLDB 5.6136151e-05
5,434 Auto-FuzzyJoin: Auto-Program Fuzzy Similarity Joins Without Labeled Examples 2021 SIGMOD 5.5045402e-05
5,469 Learned Cardinality Estimation for Similarity Queries 2021 SIGMOD 5.4898192e-05
5,902 The Communication Complexity of Distributed Set-Joins with Applications to Matrix Multiplication 2015 PODS 5.2796864e-05
6,074 Pigeonring: A Principle for Faster Thresholded Similarity Search 2019 VLDB 5.2242306e-05
6,241 Scaling Similarity Joins over Tree-Structured Data 2015 VLDB 5.1411469e-05
6,605 Dima: A Distributed In-Memory Similarity-Based Query Processing System 2017 VLDB 4.9965703e-05
6,726 A Pivotal Prefix Based Filtering Algorithm for String Similarity Search 2014 SIGMOD 4.9484027e-05
6,839 Boosting Graph Similarity Search through Pre-Computation 2021 SIGMOD 4.9109527e-05
7,109 Efficient Similarity Join and Search on Multi-Attribute Data 2015 SIGMOD 4.8292998e-05
7,141 Efficient Error-tolerant Query Autocompletion 2013 VLDB 4.8197901e-05
7,215 SyncSignature: A Simple, Efficient, Parallelizable Framework for Tree Similarity Joins 2023 VLDB 4.7985991e-05
7,588 Scalable Column Concept Determination for Web Tables Using Large Knowledge Bases 2013 VLDB 4.7030914e-05
7,635 Allign: Aligning All-Pair Near-Duplicate Passages in Long Texts 2021 SIGMOD 4.6908858e-05
7,668 Human-in-the-loop Data Integration 2017 VLDB 4.6834075e-05
7,700 Near-Duplicate Text Alignment with One Permutation Hashing 2024 SIGMOD 4.6744372e-05
8,291 TxtAlign: Efficient Near-Duplicate Text Alignment Search via Bottom-k Sketches for Plagiarism Detection 2022 SIGMOD 4.5435639e-05
9,567 META: An Efficient Matching-Based Method for Error-Tolerant Autocompletion 2016 VLDB 4.3254416e-05
9,832 Balance-Aware Distributed String Similarity-Based Query Processing System 2019 VLDB 4.2751057e-05
9,876 Near-Duplicate Sequence Search at Scale for Large Language Model Memorization Evaluation 2023 SIGMOD 4.2667743e-05
9,932 Local Filtering: Improving the Performance of Approximate Queries on String Collections 2015 SIGMOD 4.2500258e-05
9,933 Efficient and Effective KNN Sequence Search with Approximate n-grams 2014 VLDB 4.2500258e-05
10,706 Extensible and Robust Evaluation of Similarity Queries 2025 VLDB 4.1945683e-05
10,930 Similarity Joins of Sparse Features 2024 SIGMOD 4.1945683e-05
11,087 Dealing with Acronyms, Abbreviations, and Typos in Real-World Entity Matching 2024 VLDB 4.1945683e-05
11,979 Similarity Joins for Uncertain Strings 2014 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 16 of 16 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
125 Approximate String Joins in a Database (Almost) for Free 2001 VLDB 0.00044847972
155 Robust and Efficient Fuzzy Match for Online Data Cleaning 2003 SIGMOD 0.00040637896
250 Efficient set joins on similarity predicates 2004 SIGMOD 0.00030661988
266 Efficient Exact Set-Similarity Joins 2006 VLDB 0.00029718727
447 Efficient Parallel Set-Similarity Joins Using MapReduce 2010 SIGMOD 0.00022900171
1,234 Ed-Join: An Efficient Algorithm for Similarity Joins With Edit Distance Constraints 2008 VLDB 0.00013122499
2,376 Bed-Tree: An All-Purpose Index Structure for String Similarity Search Based on Edit Distance 2010 SIGMOD 8.9424361e-05
2,779 Hashed Samples: Selectivity Estimators For Set Similarity Selection Queries 2008 VLDB 8.1320575e-05
3,226 Extending Q-Grams to Estimate Selectivity of String Matching with Low Edit Distance 2007 VLDB 7.3433307e-05
3,578 Efficient Approximate Entity Extraction with Edit Distance Constraints 2009 SIGMOD 6.9503858e-05
3,868 An Efficient Filter for Approximate Membership Checking 2008 SIGMOD 6.6822543e-05
4,216 Trie-Join: Efficient Trie-based String Similarity Joins with Edit-Distance Constraints 2010 VLDB 6.3521675e-05
4,873 Power-Law Based Estimation of Set Similarity Join Size 2009 VLDB 5.8602304e-05
4,988 Incremental Maintenance of Length Normalized Indexes for Approximate String Matching 2009 SIGMOD 5.783959e-05
5,073 Faerie: Efficient Filtering Algorithms for Approximate Dictionary-based Entity Extraction 2011 SIGMOD 5.7177424e-05
5,379 Scalable Ad-hoc Entity Extraction from Text Collections 2008 VLDB 5.5405989e-05
Previous Page 1 / 1 Next

Semantically Similar Papers