Database Paper Browser

Back to papers

Dealing with Acronyms, Abbreviations, and Typos in Real-World Entity Matching

Summary: Smash: a similarity measure with a dynamic-programming algorithm that jointly handles acronyms, abbreviations, and typos for entity/record matching without needing pre-specified synonym rules. Two optimizations and OpenRefine integration yield large F‑score gains over strong baselines (including GPT‑4). (summarized by gpt-5-mini on Feb 09 2026)

Paper ID
13610
Venue
VLDB
Year
2024
Pagerank
4.1945683e-05
Overall Rank
11,087 | 22.87%
DOI
10.14778/3685800.3685830

Incoming Non-self Citations Over Time

No non-self incoming citations found for this paper in this database.

Authors

Incoming Citations (Sorted by Pagerank)

Showing 0 of 0 citing papers.

Rank Citing Paper Year Venue Pagerank
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 16 of 16 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
221 Deep Entity Matching with Pre-Trained Language Models 2021 VLDB 0.00033121824
266 Efficient Exact Set-Similarity Joins 2006 VLDB 0.00029718727
1,234 Ed-Join: An Efficient Algorithm for Similarity Joins With Edit Distance Constraints 2008 VLDB 0.00013122499
1,396 Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search 2012 SIGMOD 0.00012204748
2,592 Pass-Join: A Partition-based Method for Similarity Joins 2012 VLDB 8.4795761e-05
2,740 String Similarity Joins: An Experimental Evaluation 2014 VLDB 8.1980628e-05
3,230 Learning Semantic String Transformations from Examples 2012 VLDB 7.339123e-05
3,459 An Empirical Evaluation of Set Similarity Join Techniques 2016 VLDB 7.072508e-05
4,216 Trie-Join: Efficient Trie-based String Similarity Joins with Edit-Distance Constraints 2010 VLDB 6.3521675e-05
4,402 Smurf: Self-Service String Matching Using Random Forests 2019 VLDB 6.2195162e-05
4,684 Approximate String Joins with Abbreviations 2018 VLDB 6.0006406e-05
4,850 SEMA-JOIN: Joining Semantically-Related Tables Using Big Table Corpora 2015 VLDB 5.8768452e-05
5,151 String Similarity Measures and Joins with Synonyms 2013 SIGMOD 5.6609851e-05
5,179 SilkMoth: An Efficient Method for Finding Related Sets with Maximum Matching Constraints 2017 VLDB 5.6428428e-05
5,958 Fine-grained Concept Linking using Neural Networks in Healthcare 2018 SIGMOD 5.2563968e-05
9,563 Towards a Unified Framework for String Similarity Joins 2019 VLDB 4.3254416e-05
Previous Page 1 / 1 Next

Semantically Similar Papers