Back to papers
Approximate String Joins in a Database (Almost) for Free
Summary: Proposes approximate string joins on commercial DBs via q-grams; encodes match positions and counts, rewriting the predicate as a relational expression. Shows gains vs UDFs for full-string and substring joins; validated on real data and a prototype.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 8773
- Venue
- VLDB
- Year
- 2001
- Pagerank
- 0.00044847972
- Overall Rank
- 125 | 99.14%
- DOI
-
-
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 27 of 77 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 6,074 |
Pigeonring: A Principle for Faster Thresholded Similarity Search |
2019 |
VLDB |
5.2242306e-05 |
| 6,464 |
Reference-Based Indexing of Sequence Databases |
2006 |
VLDB |
5.0532607e-05 |
| 6,671 |
Discovering Longest-lasting Correlation in Sequence Databases |
2013 |
VLDB |
4.9669225e-05 |
| 6,726 |
A Pivotal Prefix Based Filtering Algorithm for String Similarity Search |
2014 |
SIGMOD |
4.9484027e-05 |
| 7,109 |
Efficient Similarity Join and Search on Multi-Attribute Data |
2015 |
SIGMOD |
4.8292998e-05 |
| 7,141 |
Efficient Error-tolerant Query Autocompletion |
2013 |
VLDB |
4.8197901e-05 |
| 7,215 |
SyncSignature: A Simple, Efficient, Parallelizable Framework for Tree Similarity Joins |
2023 |
VLDB |
4.7985991e-05 |
| 7,588 |
Scalable Column Concept Determination for Web Tables Using Large Knowledge Bases |
2013 |
VLDB |
4.7030914e-05 |
| 7,708 |
Efficient Top-k Algorithms for Approximate Substring Matching |
2013 |
SIGMOD |
4.6721808e-05 |
| 7,777 |
Indexing Mixed Types for Approximate Retrieval |
2005 |
VLDB |
4.653704e-05 |
| 8,137 |
Customizable and Scalable Fuzzy Join for Big Data |
2019 |
VLDB |
4.5774794e-05 |
| 8,143 |
Approximate Substring Matching over Uncertain Strings |
2011 |
VLDB |
4.5768015e-05 |
| 8,306 |
Online Windowed Subsequence Matching over Probabilistic Sequences |
2012 |
SIGMOD |
4.5435639e-05 |
| 9,439 |
On-the-Fly Token Similarity Joins in Relational Databases |
2014 |
SIGMOD |
4.3423824e-05 |
| 9,832 |
Balance-Aware Distributed String Similarity-Based Query Processing System |
2019 |
VLDB |
4.2751057e-05 |
| 9,850 |
COMPARE: Accelerating Groupwise Comparison in Relational Databases for Data Analytics |
2021 |
VLDB |
4.2721228e-05 |
| 9,932 |
Local Filtering: Improving the Performance of Approximate Queries on String Collections |
2015 |
SIGMOD |
4.2500258e-05 |
| 9,933 |
Efficient and Effective KNN Sequence Search with Approximate n-grams |
2014 |
VLDB |
4.2500258e-05 |
| 10,216 |
The Case For Language Model Approximated LIKE Predicate |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,706 |
Extensible and Robust Evaluation of Similarity Queries |
2025 |
VLDB |
4.1945683e-05 |
| 10,930 |
Similarity Joins of Sparse Features |
2024 |
SIGMOD |
4.1945683e-05 |
| 11,305 |
TokenJoin: Efficient Filtering for Set Similarity Join with Maximum Weighted Bipartite Matching |
2023 |
VLDB |
4.1945683e-05 |
| 11,724 |
ZigZag: Supporting Similarity Queries on Vector Space Models |
2018 |
SIGMOD |
4.1945683e-05 |
| 11,979 |
Similarity Joins for Uncertain Strings |
2014 |
SIGMOD |
4.1945683e-05 |
| 11,988 |
MESA: A Map Service to Support Fuzzy Type-ahead Search over Geo-Textual Data |
2014 |
VLDB |
4.1945683e-05 |
| 12,544 |
SPIDER: Flexible Matching in Databases |
2005 |
SIGMOD |
4.1945683e-05 |
| 12,582 |
LexEQUAL: Multilexical Matching Operator in SQL |
2004 |
SIGMOD |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 4 of 4 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 3,529 |
Merging the Results of Approximate Match Operations |
2004 |
VLDB |
7.0059524e-05 |
| 3,226 |
Extending Q-Grams to Estimate Selectivity of String Matching with Low Edit Distance |
2007 |
VLDB |
7.3433307e-05 |
| 4,216 |
Trie-Join: Efficient Trie-based String Similarity Joins with Edit-Distance Constraints |
2010 |
VLDB |
6.3521675e-05 |
| 7,777 |
Indexing Mixed Types for Approximate Retrieval |
2005 |
VLDB |
4.653704e-05 |
| 1,234 |
Ed-Join: An Efficient Algorithm for Similarity Joins With Edit Distance Constraints |
2008 |
VLDB |
0.00013122499 |
| 9,430 |
Approximate Joins: Concepts and Techniques |
2005 |
VLDB |
4.3441378e-05 |
| 9,563 |
Towards a Unified Framework for String Similarity Joins |
2019 |
VLDB |
4.3254416e-05 |
| 11,979 |
Similarity Joins for Uncertain Strings |
2014 |
SIGMOD |
4.1945683e-05 |
| 7,708 |
Efficient Top-k Algorithms for Approximate Substring Matching |
2013 |
SIGMOD |
4.6721808e-05 |
| 4,901 |
Probabilistic String Similarity Joins |
2010 |
SIGMOD |
5.8411648e-05 |