Back to papers
Auto-FuzzyJoin: Auto-Program Fuzzy Similarity Joins Without Labeled Examples
Summary: Auto-FuzzyJoin auto-programs fuzzy similarity joins without labeled data by exploiting a geometric interpretation of distance-functions to meet a user-specified precision tau while maximizing recall. On 50 Wikipedia-derived fuzzy-join tasks, it beats unsupervised baselines and rivals supervised methods with partial labels; code and benchmark data are released on GitHub.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 6104
- Venue
- SIGMOD
- Year
- 2021
- Pagerank
- 5.5045402e-05
- Overall Rank
- 5,434 | 62.20%
- DOI
-
10.1145/3448016.3452824
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 11 of 11 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 2,587 |
Table-GPT: Table Fine-tuned GPT for Diverse Table Tasks |
2024 |
SIGMOD |
8.4924618e-05 |
| 3,942 |
Ember: No-Code Context Enrichment via Similarity-Based Keyless Joins |
2022 |
VLDB |
6.6114622e-05 |
| 5,869 |
Demonstration of Panda: A Weakly Supervised Entity Matching System |
2021 |
VLDB |
5.2959029e-05 |
| 6,553 |
How do Categorical Duplicates Affect ML? A New Benchmark and Empirical Analyses |
2024 |
VLDB |
5.0157344e-05 |
| 6,800 |
DTT: An Example-Driven Tabular Transformer for Joinability by Leveraging Large Language Models |
2024 |
SIGMOD |
4.9231471e-05 |
| 8,099 |
Sparkly: A Simple yet Surprisingly Strong TF/IDF Blocker for Entity Matching |
2023 |
VLDB |
4.5859317e-05 |
| 9,399 |
TabulaX: Leveraging Large Language Models for Multi-Class Table Transformations |
2025 |
VLDB |
4.3441378e-05 |
| 9,409 |
Ground Truth Inference for Weakly Supervised Entity Matching |
2023 |
SIGMOD |
4.3441378e-05 |
| 9,490 |
Auto-BI: Automatically Build BI-Models Leveraging Local Join Prediction and Global Schema Graph |
2023 |
VLDB |
4.3341665e-05 |
| 10,598 |
Auto-Prep: Holistic Prediction of Data Preparation Steps for Self-Service Business Intelligence |
2025 |
VLDB |
4.1945683e-05 |
| 10,754 |
OmniMatch: Joinability Discovery in Data Products |
2025 |
VLDB |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 18 of 18 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 155 |
Robust and Efficient Fuzzy Match for Online Data Cleaning |
2003 |
SIGMOD |
0.00040637896 |
| 221 |
Deep Entity Matching with Pre-Trained Language Models |
2021 |
VLDB |
0.00033121824 |
| 266 |
Efficient Exact Set-Similarity Joins |
2006 |
VLDB |
0.00029718727 |
| 300 |
Deep Learning for Entity Matching: A Design Space Exploration |
2018 |
SIGMOD |
0.00028441466 |
| 319 |
Evaluation of entity resolution approaches on real-world match problems |
2010 |
VLDB |
0.00027781866 |
| 447 |
Efficient Parallel Set-Similarity Joins Using MapReduce |
2010 |
SIGMOD |
0.00022900171 |
| 712 |
Magellan: Toward Building Entity Matching Management Systems |
2016 |
VLDB |
0.00017732426 |
| 1,396 |
Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search |
2012 |
SIGMOD |
0.00012204748 |
| 1,715 |
V-SMART-Join: A Scalable MapReduce Framework for All-Pair Similarity Joins of Multisets and Vectors |
2012 |
VLDB |
0.00010803271 |
| 2,514 |
Comparative Analysis of Approximate Blocking Techniques for Entity Resolution |
2016 |
VLDB |
8.6139012e-05 |
| 2,592 |
Pass-Join: A Partition-based Method for Similarity Joins |
2012 |
VLDB |
8.4795761e-05 |
| 3,140 |
ZeroER: Entity Resolution using Zero Labeled Examples |
2020 |
SIGMOD |
7.4841763e-05 |
| 3,141 |
ClusterJoin: A Similarity Joins Framework using Map-Reduce |
2014 |
VLDB |
7.4829448e-05 |
| 3,328 |
Multi-column Substring Matching for Database Schema Translation |
2006 |
VLDB |
7.2174278e-05 |
| 3,528 |
Distributed Data Deduplication |
2016 |
VLDB |
7.0066139e-05 |
| 3,735 |
Auto-Join: Joining Tables by Leveraging Transformations |
2017 |
VLDB |
6.8061318e-05 |
| 4,147 |
Exploiting MapReduce-based Similarity Joins |
2012 |
SIGMOD |
6.4096022e-05 |
| 4,850 |
SEMA-JOIN: Joining Semantically-Related Tables Using Big Table Corpora |
2015 |
VLDB |
5.8768452e-05 |
Semantically Similar Papers