Back to papers
Efficient Approximate Search on String Collections (Tutorial)
Summary: Tutorial survey of efficient approximate search in string collections. Comprehensive coverage: indexes, search algorithms, filtering, selectivity estimation, and related work; analyzes merits/limits and offers synthesis for scalable, practical design.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 9983
- Venue
- VLDB
- Year
- 2009
- Pagerank
- 5.2879769e-05
- Overall Rank
- 5,887 | 59.05%
- DOI
-
-
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 4 of 4 citing papers.
Outgoing Citations (Sorted by Pagerank)
Showing 19 of 19 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 125 |
Approximate String Joins in a Database (Almost) for Free |
2001 |
VLDB |
0.00044847972 |
| 250 |
Efficient set joins on similarity predicates |
2004 |
SIGMOD |
0.00030661988 |
| 266 |
Efficient Exact Set-Similarity Joins |
2006 |
VLDB |
0.00029718727 |
| 322 |
Record Linkage: Similarity Measures and Algorithms |
2006 |
SIGMOD |
0.00027518768 |
| 1,202 |
VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams |
2007 |
VLDB |
0.00013326298 |
| 1,234 |
Ed-Join: An Efficient Algorithm for Similarity Joins With Edit Distance Constraints |
2008 |
VLDB |
0.00013122499 |
| 1,533 |
Example-driven Design of Efficient Record Matching Queries |
2007 |
VLDB |
0.00011471971 |
| 1,830 |
Relaxing Join and Selection Queries |
2006 |
VLDB |
0.000103862 |
| 2,073 |
Extending Autocompletion To Tolerate Errors |
2009 |
SIGMOD |
9.6142791e-05 |
| 2,193 |
Cost-Based Variable-Length-Gram Selection for String Collections to Support Approximate Queries Efficiently |
2008 |
SIGMOD |
9.3178557e-05 |
| 2,386 |
Leveraging Aggregate Constraints For Deduplication |
2007 |
SIGMOD |
8.9231895e-05 |
| 2,779 |
Hashed Samples: Selectivity Estimators For Set Similarity Selection Queries |
2008 |
VLDB |
8.1320575e-05 |
| 3,226 |
Extending Q-Grams to Estimate Selectivity of String Matching with Low Edit Distance |
2007 |
VLDB |
7.3433307e-05 |
| 3,868 |
An Efficient Filter for Approximate Membership Checking |
2008 |
SIGMOD |
6.6822543e-05 |
| 4,414 |
Efficient Type-Ahead Search on Relational Data: a TASTIER Approach |
2009 |
SIGMOD |
6.2056993e-05 |
| 4,438 |
Selectivity Estimation for Fuzzy String Predicates in Large Data Sets |
2005 |
VLDB |
6.1898903e-05 |
| 4,988 |
Incremental Maintenance of Length Normalized Indexes for Approximate String Matching |
2009 |
SIGMOD |
5.783959e-05 |
| 7,669 |
Incorporating String Transformations in Record Matching |
2008 |
SIGMOD |
4.6833751e-05 |
| 7,777 |
Indexing Mixed Types for Approximate Retrieval |
2005 |
VLDB |
4.653704e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 9,933 |
Efficient and Effective KNN Sequence Search with Approximate n-grams |
2014 |
VLDB |
4.2500258e-05 |
| 4,988 |
Incremental Maintenance of Length Normalized Indexes for Approximate String Matching |
2009 |
SIGMOD |
5.783959e-05 |
| 7,777 |
Indexing Mixed Types for Approximate Retrieval |
2005 |
VLDB |
4.653704e-05 |
| 12,294 |
Worst-Case Efficient Range Search Indexing |
2009 |
PODS |
4.1945683e-05 |
| 4,333 |
An Efficient Index Structure for String Databases |
2001 |
VLDB |
6.2805237e-05 |
| 1,184 |
On Effective Multi-Dimensional Indexing for Strings |
2000 |
SIGMOD |
0.00013455208 |
| 2,193 |
Cost-Based Variable-Length-Gram Selection for String Collections to Support Approximate Queries Efficiently |
2008 |
SIGMOD |
9.3178557e-05 |
| 8,660 |
On Searching Compressed String Collections Cache-Obliviously |
2008 |
PODS |
4.4722862e-05 |
| 7,708 |
Efficient Top-k Algorithms for Approximate Substring Matching |
2013 |
SIGMOD |
4.6721808e-05 |
| 9,932 |
Local Filtering: Improving the Performance of Approximate Queries on String Collections |
2015 |
SIGMOD |
4.2500258e-05 |