Towards a Unified Framework for String Similarity Joins
Summary: Unified framework for combining syntactic, synonym-based, and taxonomy-based string similarities in joins. NP-hard to maximize; introduces a polynomial-time approximation and a pebble-based multi-measure signature for efficient filtering. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Pengfei Xu
- 2. Jiaheng Lu
Incoming Citations (Sorted by Pagerank)
Showing 1 of 1 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 11,087 | Dealing with Acronyms, Abbreviations, and Typos in Real-World Entity Matching | 2024 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 8 of 8 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 280 | Eliminating Fuzzy Duplicates in Data Warehouses | 2002 | VLDB | 0.00029113044 |
| 1,193 | Join Size Estimation Subject to Filter Conditions | 2015 | VLDB | 0.00013414989 |
| 1,396 | Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search | 2012 | SIGMOD | 0.00012204748 |
| 2,740 | String Similarity Joins: An Experimental Evaluation | 2014 | VLDB | 8.1980628e-05 |
| 3,451 | Learning String Transformations From Examples | 2009 | VLDB | 7.0822216e-05 |
| 4,684 | Approximate String Joins with Abbreviations | 2018 | VLDB | 6.0006406e-05 |
| 5,151 | String Similarity Measures and Joins with Synonyms | 2013 | SIGMOD | 5.6609851e-05 |
| 6,074 | Pigeonring: A Principle for Faster Thresholded Similarity Search | 2019 | VLDB | 5.2242306e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 4,684 | Approximate String Joins with Abbreviations | 2018 | VLDB | 6.0006406e-05 |
| 7,109 | Efficient Similarity Join and Search on Multi-Attribute Data | 2015 | SIGMOD | 4.8292998e-05 |
| 2,592 | Pass-Join: A Partition-based Method for Similarity Joins | 2012 | VLDB | 8.4795761e-05 |
| 8,899 | Fast Approximate Similarity Join in Vector Databases | 2025 | SIGMOD | 4.427232e-05 |
| 125 | Approximate String Joins in a Database (Almost) for Free | 2001 | VLDB | 0.00044847972 |
| 4,901 | Probabilistic String Similarity Joins | 2010 | SIGMOD | 5.8411648e-05 |
| 4,216 | Trie-Join: Efficient Trie-based String Similarity Joins with Edit-Distance Constraints | 2010 | VLDB | 6.3521675e-05 |
| 5,151 | String Similarity Measures and Joins with Synonyms | 2013 | SIGMOD | 5.6609851e-05 |
| 2,740 | String Similarity Joins: An Experimental Evaluation | 2014 | VLDB | 8.1980628e-05 |
| 11,979 | Similarity Joins for Uncertain Strings | 2014 | SIGMOD | 4.1945683e-05 |