Deep Entity Matching with Pre-Trained Language Models
Summary: Ditto uses pre-trained transformers as sequence-pair classifiers for entity matching, beating SOTA by up to 29% F1. Adds domain highlighting, input-length summarization, and hard-example augmentation to boost with fewer labels; on 789k/412k records, Ditto reaches 96.5% F1. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Yuliang Li
- 2. Jinfeng Li
- 3. Yoshihiko Suhara
- 4. AnHai Doan
- 5. Wang-Chiew Tan
Incoming Citations (Sorted by Pagerank)
Showing 50 of 90 citing papers.
Outgoing Citations (Sorted by Pagerank)
Showing 11 of 11 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 263 | CrowdER: Crowdsourcing Entity Resolution | 2012 | VLDB | 0.00029862413 |
| 267 | Human-powered Sorts and Joins | 2012 | VLDB | 0.00029690405 |
| 300 | Deep Learning for Entity Matching: A Design Space Exploration | 2018 | SIGMOD | 0.00028441466 |
| 319 | Evaluation of entity resolution approaches on real-world match problems | 2010 | VLDB | 0.00027781866 |
| 643 | Corleone: Hands-Off Crowdsourcing for Entity Matching | 2014 | SIGMOD | 0.00018754451 |
| 712 | Magellan: Toward Building Entity Matching Management Systems | 2016 | VLDB | 0.00017732426 |
| 754 | Distributed Representations of Tuples for Entity Resolution | 2018 | VLDB | 0.00017117211 |
| 1,345 | Entity Matching: How Similar Is Similar | 2011 | VLDB | 0.00012468408 |
| 1,831 | Synthesizing Entity Matching Rules by Examples | 2018 | VLDB | 0.00010384082 |
| 2,767 | A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching | 2020 | SIGMOD | 8.1513883e-05 |
| 3,582 | NADEEF/ER: Generic and Interactive Entity Resolution | 2014 | SIGMOD | 6.9479263e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 2,767 | A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching | 2020 | SIGMOD | 8.1513883e-05 |
| 3,578 | Efficient Approximate Entity Extraction with Edit Distance Constraints | 2009 | SIGMOD | 6.9503858e-05 |
| 9,460 | The Battleship Approach to the Low Resource Entity Matching Problem | 2023 | SIGMOD | 4.3366491e-05 |
| 3,640 | Deep Learning for Blocking in Entity Matching: A Design Space Exploration | 2021 | VLDB | 6.8891671e-05 |
| 6,569 | Domain Adaptation for Deep Entity Resolution | 2022 | SIGMOD | 5.0065379e-05 |
| 7,052 | Pre-trained Embeddings for Entity Resolution: An Experimental Analysis | 2023 | VLDB | 4.8497453e-05 |
| 4,837 | Entity Resolution with Hierarchical Graph Attention Networks | 2022 | SIGMOD | 5.8892326e-05 |
| 6,711 | Analyzing How BERT Performs Entity Matching | 2022 | VLDB | 4.9517546e-05 |
| 300 | Deep Learning for Entity Matching: A Design Space Exploration | 2018 | SIGMOD | 0.00028441466 |
| 5,533 | Dual-Objective Fine-Tuning of BERT for Entity Matching | 2021 | VLDB | 5.4544359e-05 |