Incorporating String Transformations in Record Matching
Summary: Extends record matching by allowing user-defined string transformations (e.g., Robert/Bob) to define string similarity. Proposes an index-aware fuzzy-lookup framework that combines transformations with a base similarity, achieving better match quality and faster retrieval. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Arvind Arasu
- 2. Surajit Chaudhuri
- 3. Kris Ganjam
- 4. Raghav Kaushik
Incoming Citations (Sorted by Pagerank)
Showing 2 of 2 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3,578 | Efficient Approximate Entity Extraction with Edit Distance Constraints | 2009 | SIGMOD | 6.9503858e-05 |
| 5,887 | Efficient Approximate Search on String Collections (Tutorial) | 2009 | VLDB | 5.2879769e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 4 of 4 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 34 | Similarity Search in High Dimensions via Hashing | 1999 | VLDB | 0.00076637636 |
| 266 | Efficient Exact Set-Similarity Joins | 2006 | VLDB | 0.00029718727 |
| 322 | Record Linkage: Similarity Measures and Algorithms | 2006 | SIGMOD | 0.00027518768 |
| 7,725 | Data Cleaning in Microsoft SQL Server 2005 | 2005 | SIGMOD | 4.6670883e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 125 | Approximate String Joins in a Database (Almost) for Free | 2001 | VLDB | 0.00044847972 |
| 9,563 | Towards a Unified Framework for String Similarity Joins | 2019 | VLDB | 4.3254416e-05 |
| 4,901 | Probabilistic String Similarity Joins | 2010 | SIGMOD | 5.8411648e-05 |
| 4,684 | Approximate String Joins with Abbreviations | 2018 | VLDB | 6.0006406e-05 |
| 11,979 | Similarity Joins for Uncertain Strings | 2014 | SIGMOD | 4.1945683e-05 |
| 1,533 | Example-driven Design of Efficient Record Matching Queries | 2007 | VLDB | 0.00011471971 |
| 5,536 | On Indexing Error-Tolerant Set Containment | 2010 | SIGMOD | 5.4532734e-05 |
| 5,151 | String Similarity Measures and Joins with Synonyms | 2013 | SIGMOD | 5.6609851e-05 |
| 4,026 | Flexible String Matching Against Large Databases in Practice | 2004 | VLDB | 6.5169976e-05 |
| 3,451 | Learning String Transformations From Examples | 2009 | VLDB | 7.0822216e-05 |