Database Paper Browser

Back to papers

Ed-Join: An Efficient Algorithm for Similarity Joins With Edit Distance Constraints

Summary: EdJoin leverages mismatch-based q-grams to derive two novel edit-distance lower bounds, enabling tighter filtering for similarity joins under edit-distance constraints. It dramatically reduces candidate sets and runtime, outperforming prior q-gram–based approaches on large real datasets. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
9726
Venue
VLDB
Year
2008
Pagerank
0.00013122499
Overall Rank
1,234 | 91.42%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 37 of 37 citing papers.

Rank Citing Paper Year Venue Pagerank
447 Efficient Parallel Set-Similarity Joins Using MapReduce 2010 SIGMOD 0.00022900171
1,396 Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search 2012 SIGMOD 0.00012204748
2,073 Extending Autocompletion To Tolerate Errors 2009 SIGMOD 9.6142791e-05
2,175 Falcon: Scaling Up Hands-Off Crowdsourced Entity Matching to Build Cloud Services 2017 SIGMOD 9.3644117e-05
2,376 Bed-Tree: An All-Purpose Index Structure for String Similarity Search Based on Edit Distance 2010 SIGMOD 8.9424361e-05
2,592 Pass-Join: A Partition-based Method for Similarity Joins 2012 VLDB 8.4795761e-05
2,740 String Similarity Joins: An Experimental Evaluation 2014 VLDB 8.1980628e-05
3,578 Efficient Approximate Entity Extraction with Edit Distance Constraints 2009 SIGMOD 6.9503858e-05
3,774 Efficient Exact Edit Similarity Query Processing with the Asymmetric Signature Scheme 2011 SIGMOD 6.7757301e-05
4,216 Trie-Join: Efficient Trie-based String Similarity Joins with Edit-Distance Constraints 2010 VLDB 6.3521675e-05
4,684 Approximate String Joins with Abbreviations 2018 VLDB 6.0006406e-05
4,901 Probabilistic String Similarity Joins 2010 SIGMOD 5.8411648e-05
5,073 Faerie: Efficient Filtering Algorithms for Approximate Dictionary-based Entity Extraction 2011 SIGMOD 5.7177424e-05
5,291 Fast Subtrajectory Similarity Search in Road Networks under Weighted Edit Distance Constraints 2020 VLDB 5.5826473e-05
5,887 Efficient Approximate Search on String Collections (Tutorial) 2009 VLDB 5.2879769e-05
6,074 Pigeonring: A Principle for Faster Thresholded Similarity Search 2019 VLDB 5.2242306e-05
6,726 A Pivotal Prefix Based Filtering Algorithm for String Similarity Search 2014 SIGMOD 4.9484027e-05
6,839 Boosting Graph Similarity Search through Pre-Computation 2021 SIGMOD 4.9109527e-05
7,061 Serving Deep Learning Models with Deduplication from Relational Databases 2022 VLDB 4.8463881e-05
7,109 Efficient Similarity Join and Search on Multi-Attribute Data 2015 SIGMOD 4.8292998e-05
7,215 SyncSignature: A Simple, Efficient, Parallelizable Framework for Tree Similarity Joins 2023 VLDB 4.7985991e-05
7,588 Scalable Column Concept Determination for Web Tables Using Large Knowledge Bases 2013 VLDB 4.7030914e-05
7,668 Human-in-the-loop Data Integration 2017 VLDB 4.6834075e-05
7,700 Near-Duplicate Text Alignment with One Permutation Hashing 2024 SIGMOD 4.6744372e-05
7,765 Cache-oblivious High-performance Similarity Join 2019 SIGMOD 4.6572085e-05
8,932 Comparative evaluation of entity resolution approaches with FEVER 2009 VLDB 4.427232e-05
9,567 META: An Efficient Matching-Based Method for Error-Tolerant Autocompletion 2016 VLDB 4.3254416e-05
9,832 Balance-Aware Distributed String Similarity-Based Query Processing System 2019 VLDB 4.2751057e-05
9,932 Local Filtering: Improving the Performance of Approximate Queries on String Collections 2015 SIGMOD 4.2500258e-05
10,068 DiskJoin: Large-scale Vector Similarity Join with SSD 2026 SIGMOD 4.1945683e-05
10,499 Privacy and Accuracy-Aware AI/ML Model Deduplication 2025 SIGMOD 4.1945683e-05
10,706 Extensible and Robust Evaluation of Similarity Queries 2025 VLDB 4.1945683e-05
10,930 Similarity Joins of Sparse Features 2024 SIGMOD 4.1945683e-05
11,087 Dealing with Acronyms, Abbreviations, and Typos in Real-World Entity Matching 2024 VLDB 4.1945683e-05
11,305 TokenJoin: Efficient Filtering for Set Similarity Join with Maximum Weighted Bipartite Matching 2023 VLDB 4.1945683e-05
11,724 ZigZag: Supporting Similarity Queries on Vector Space Models 2018 SIGMOD 4.1945683e-05
11,979 Similarity Joins for Uncertain Strings 2014 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 12 of 12 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers