Database Paper Browser

Back to papers

Auto-FuzzyJoin: Auto-Program Fuzzy Similarity Joins Without Labeled Examples

Summary: Auto-FuzzyJoin auto-programs fuzzy similarity joins without labeled data by exploiting a geometric interpretation of distance-functions to meet a user-specified precision tau while maximizing recall. On 50 Wikipedia-derived fuzzy-join tasks, it beats unsupervised baselines and rivals supervised methods with partial labels; code and benchmark data are released on GitHub. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
6104
Venue
SIGMOD
Year
2021
Pagerank
5.5045402e-05
Overall Rank
5,434 | 62.20%
DOI
10.1145/3448016.3452824

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 11 of 11 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 18 of 18 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
155 Robust and Efficient Fuzzy Match for Online Data Cleaning 2003 SIGMOD 0.00040637896
221 Deep Entity Matching with Pre-Trained Language Models 2021 VLDB 0.00033121824
266 Efficient Exact Set-Similarity Joins 2006 VLDB 0.00029718727
300 Deep Learning for Entity Matching: A Design Space Exploration 2018 SIGMOD 0.00028441466
319 Evaluation of entity resolution approaches on real-world match problems 2010 VLDB 0.00027781866
447 Efficient Parallel Set-Similarity Joins Using MapReduce 2010 SIGMOD 0.00022900171
712 Magellan: Toward Building Entity Matching Management Systems 2016 VLDB 0.00017732426
1,396 Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search 2012 SIGMOD 0.00012204748
1,715 V-SMART-Join: A Scalable MapReduce Framework for All-Pair Similarity Joins of Multisets and Vectors 2012 VLDB 0.00010803271
2,514 Comparative Analysis of Approximate Blocking Techniques for Entity Resolution 2016 VLDB 8.6139012e-05
2,592 Pass-Join: A Partition-based Method for Similarity Joins 2012 VLDB 8.4795761e-05
3,140 ZeroER: Entity Resolution using Zero Labeled Examples 2020 SIGMOD 7.4841763e-05
3,141 ClusterJoin: A Similarity Joins Framework using Map-Reduce 2014 VLDB 7.4829448e-05
3,328 Multi-column Substring Matching for Database Schema Translation 2006 VLDB 7.2174278e-05
3,528 Distributed Data Deduplication 2016 VLDB 7.0066139e-05
3,735 Auto-Join: Joining Tables by Leveraging Transformations 2017 VLDB 6.8061318e-05
4,147 Exploiting MapReduce-based Similarity Joins 2012 SIGMOD 6.4096022e-05
4,850 SEMA-JOIN: Joining Semantically-Related Tables Using Big Table Corpora 2015 VLDB 5.8768452e-05
Previous Page 1 / 1 Next

Semantically Similar Papers