Database Paper Browser

Back to papers

Efficient Exact Set-Similarity Joins

Summary: Exact set-similarity join (SSJoin) algorithms for cross-collection sets. First to achieve both exact results and deterministic performance guarantees, surpassing prior probabilistic-guarantee methods; validated on real and synthetic datasets. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
9504
Venue
VLDB
Year
2006
Pagerank
0.00029718727
Overall Rank
266 | 98.16%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 50 of 78 citing papers.

Rank Citing Paper Year Venue Pagerank
447 Efficient Parallel Set-Similarity Joins Using MapReduce 2010 SIGMOD 0.00022900171
509 On Active Learning of Record Matching Packages 2010 SIGMOD 0.00021409518
627 Management of Probabilistic Data: Foundations and Challenges 2007 PODS 0.00018959005
936 Framework for Evaluating Clustering Algorithms in Duplicate Detection 2009 VLDB 0.0001521549
1,187 JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes 2019 SIGMOD 0.00013443639
1,234 Ed-Join: An Efficient Algorithm for Similarity Joins With Edit Distance Constraints 2008 VLDB 0.00013122499
1,345 Entity Matching: How Similar Is Similar 2011 VLDB 0.00012468408
1,396 Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search 2012 SIGMOD 0.00012204748
1,533 Example-driven Design of Efficient Record Matching Queries 2007 VLDB 0.00011471971
1,715 V-SMART-Join: A Scalable MapReduce Framework for All-Pair Similarity Joins of Multisets and Vectors 2012 VLDB 0.00010803271
1,944 WHAM: A High-throughput Sequence Alignment Method 2011 SIGMOD 0.00010004608
2,024 ATLAS: A Probabilistic Algorithm for High Dimensional Similarity Search 2011 SIGMOD 9.7519678e-05
2,073 Extending Autocompletion To Tolerate Errors 2009 SIGMOD 9.6142791e-05
2,376 Bed-Tree: An All-Purpose Index Structure for String Similarity Search Based on Edit Distance 2010 SIGMOD 8.9424361e-05
2,592 Pass-Join: A Partition-based Method for Similarity Joins 2012 VLDB 8.4795761e-05
2,740 String Similarity Joins: An Experimental Evaluation 2014 VLDB 8.1980628e-05
2,779 Hashed Samples: Selectivity Estimators For Set Similarity Selection Queries 2008 VLDB 8.1320575e-05
3,140 ZeroER: Entity Resolution using Zero Labeled Examples 2020 SIGMOD 7.4841763e-05
3,141 ClusterJoin: A Similarity Joins Framework using Map-Reduce 2014 VLDB 7.4829448e-05
3,267 Benchmarking Declarative Approximate Selection Predicates 2007 SIGMOD 7.3058429e-05
3,459 An Empirical Evaluation of Set Similarity Join Techniques 2016 VLDB 7.072508e-05
3,490 Leveraging Set Relations in Exact Set Similarity Join 2017 VLDB 7.0465856e-05
3,514 Spatio-Textual Similarity Joins 2013 VLDB 7.0226998e-05
3,578 Efficient Approximate Entity Extraction with Edit Distance Constraints 2009 SIGMOD 6.9503858e-05
3,774 Efficient Exact Edit Similarity Query Processing with the Asymmetric Signature Scheme 2011 SIGMOD 6.7757301e-05
3,868 An Efficient Filter for Approximate Membership Checking 2008 SIGMOD 6.6822543e-05
4,050 An Efficient Partition Based Method for Exact Set Similarity Joins 2016 VLDB 6.4953612e-05
4,216 Trie-Join: Efficient Trie-based String Similarity Joins with Edit-Distance Constraints 2010 VLDB 6.3521675e-05
4,250 Local Similarity Search for Unstructured Text 2016 SIGMOD 6.3241139e-05
4,353 Overlap Set Similarity Joins with Theoretical Guarantees 2018 SIGMOD 6.263585e-05
4,402 Smurf: Self-Service String Matching Using Random Forests 2019 VLDB 6.2195162e-05
4,684 Approximate String Joins with Abbreviations 2018 VLDB 6.0006406e-05
4,775 Set Similarity Joins on MapReduce: An Experimental Survey 2018 VLDB 5.9315784e-05
4,808 On the Complexity of Inner Product Similarity Join 2016 PODS 5.908896e-05
4,873 Power-Law Based Estimation of Set Similarity Join Size 2009 VLDB 5.8602304e-05
4,988 Incremental Maintenance of Length Normalized Indexes for Approximate String Matching 2009 SIGMOD 5.783959e-05
4,995 On Link-based Similarity Join 2011 VLDB 5.7787414e-05
5,073 Faerie: Efficient Filtering Algorithms for Approximate Dictionary-based Entity Extraction 2011 SIGMOD 5.7177424e-05
5,151 String Similarity Measures and Joins with Synonyms 2013 SIGMOD 5.6609851e-05
5,179 SilkMoth: An Efficient Method for Finding Related Sets with Maximum Matching Constraints 2017 VLDB 5.6428428e-05
5,220 Similarity Join Size Estimation using Locality Sensitive Hashing 2011 VLDB 5.6216111e-05
5,232 SEAL: Spatio-Textual Similarity Search 2012 VLDB 5.6136151e-05
5,365 Question Answering Over Knowledge Graphs: Question Understanding Via Template Decomposition 2018 VLDB 5.5461187e-05
5,379 Scalable Ad-hoc Entity Extraction from Text Collections 2008 VLDB 5.5405989e-05
5,434 Auto-FuzzyJoin: Auto-Program Fuzzy Similarity Joins Without Labeled Examples 2021 SIGMOD 5.5045402e-05
5,536 On Indexing Error-Tolerant Set Containment 2010 SIGMOD 5.4532734e-05
5,887 Efficient Approximate Search on String Collections (Tutorial) 2009 VLDB 5.2879769e-05
5,902 The Communication Complexity of Distributed Set-Joins with Applications to Matrix Multiplication 2015 PODS 5.2796864e-05
6,074 Pigeonring: A Principle for Faster Thresholded Similarity Search 2019 VLDB 5.2242306e-05
6,605 Dima: A Distributed In-Memory Similarity-Based Query Processing System 2017 VLDB 4.9965703e-05
Previous Page 1 / 2 Next

Outgoing Citations (Sorted by Pagerank)

Showing 12 of 12 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers