Database Paper Browser

Back to papers

Approximate String Joins in a Database (Almost) for Free

Summary: Proposes approximate string joins on commercial DBs via q-grams; encodes match positions and counts, rewriting the predicate as a relational expression. Shows gains vs UDFs for full-string and substring joins; validated on real data and a prototype. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
8773
Venue
VLDB
Year
2001
Pagerank
0.00044847972
Overall Rank
125 | 99.14%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 27 of 77 citing papers.

Rank Citing Paper Year Venue Pagerank
6,074 Pigeonring: A Principle for Faster Thresholded Similarity Search 2019 VLDB 5.2242306e-05
6,464 Reference-Based Indexing of Sequence Databases 2006 VLDB 5.0532607e-05
6,671 Discovering Longest-lasting Correlation in Sequence Databases 2013 VLDB 4.9669225e-05
6,726 A Pivotal Prefix Based Filtering Algorithm for String Similarity Search 2014 SIGMOD 4.9484027e-05
7,109 Efficient Similarity Join and Search on Multi-Attribute Data 2015 SIGMOD 4.8292998e-05
7,141 Efficient Error-tolerant Query Autocompletion 2013 VLDB 4.8197901e-05
7,215 SyncSignature: A Simple, Efficient, Parallelizable Framework for Tree Similarity Joins 2023 VLDB 4.7985991e-05
7,588 Scalable Column Concept Determination for Web Tables Using Large Knowledge Bases 2013 VLDB 4.7030914e-05
7,708 Efficient Top-k Algorithms for Approximate Substring Matching 2013 SIGMOD 4.6721808e-05
7,777 Indexing Mixed Types for Approximate Retrieval 2005 VLDB 4.653704e-05
8,137 Customizable and Scalable Fuzzy Join for Big Data 2019 VLDB 4.5774794e-05
8,143 Approximate Substring Matching over Uncertain Strings 2011 VLDB 4.5768015e-05
8,306 Online Windowed Subsequence Matching over Probabilistic Sequences 2012 SIGMOD 4.5435639e-05
9,439 On-the-Fly Token Similarity Joins in Relational Databases 2014 SIGMOD 4.3423824e-05
9,832 Balance-Aware Distributed String Similarity-Based Query Processing System 2019 VLDB 4.2751057e-05
9,850 COMPARE: Accelerating Groupwise Comparison in Relational Databases for Data Analytics 2021 VLDB 4.2721228e-05
9,932 Local Filtering: Improving the Performance of Approximate Queries on String Collections 2015 SIGMOD 4.2500258e-05
9,933 Efficient and Effective KNN Sequence Search with Approximate n-grams 2014 VLDB 4.2500258e-05
10,216 The Case For Language Model Approximated LIKE Predicate 2026 SIGMOD 4.1945683e-05
10,706 Extensible and Robust Evaluation of Similarity Queries 2025 VLDB 4.1945683e-05
10,930 Similarity Joins of Sparse Features 2024 SIGMOD 4.1945683e-05
11,305 TokenJoin: Efficient Filtering for Set Similarity Join with Maximum Weighted Bipartite Matching 2023 VLDB 4.1945683e-05
11,724 ZigZag: Supporting Similarity Queries on Vector Space Models 2018 SIGMOD 4.1945683e-05
11,979 Similarity Joins for Uncertain Strings 2014 SIGMOD 4.1945683e-05
11,988 MESA: A Map Service to Support Fuzzy Type-ahead Search over Geo-Textual Data 2014 VLDB 4.1945683e-05
12,544 SPIDER: Flexible Matching in Databases 2005 SIGMOD 4.1945683e-05
12,582 LexEQUAL: Multilexical Matching Operator in SQL 2004 SIGMOD 4.1945683e-05
Previous Page 2 / 2 Next

Outgoing Citations (Sorted by Pagerank)

Showing 4 of 4 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers