On the String Matching with k Differences in DNA Databases
Summary: Proposes a BWT(y) index for approximate string matching with up to k edits in large DNA databases, enabling efficient querying of short patterns in massive genomes. Decomposes x into l subpatterns, uses BWT(y) to locate candidates within floor(k/l) differences, then rechecks; time O(k|T|) with |T| ≈ O(|Σ| 2^k), scalable for k ≤ log_{|Σ|} n; experiments show promise. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Authors
- 1. Yangjun Chen
- 2. Hoang Hai Nguyen
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 0 of 0 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 6,983 | A Generic Framework for Efficient and Effective Subsequence Retrieval | 2012 | VLDB | 4.8732757e-05 |
| 5,812 | Reference-Based Alignment in Large Sequence Databases | 2009 | VLDB | 5.3172025e-05 |
| 2,583 | Practical Suffix Tree Construction | 2004 | VLDB | 8.497732e-05 |
| 8,306 | Online Windowed Subsequence Matching over Probabilistic Sequences | 2012 | SIGMOD | 4.5435639e-05 |
| 5,397 | Fast nGram-Based String Search Over Data Encoded Using Algebraic Signatures | 2007 | VLDB | 5.5299002e-05 |
| 12,365 | Improving Suffix Array Locality for Fast Pattern Matching on Disk | 2008 | SIGMOD | 4.1945683e-05 |
| 6,464 | Reference-Based Indexing of Sequence Databases | 2006 | VLDB | 5.0532607e-05 |
| 2,376 | Bed-Tree: An All-Purpose Index Structure for String Similarity Search Based on Edit Distance | 2010 | SIGMOD | 8.9424361e-05 |
| 7,708 | Efficient Top-k Algorithms for Approximate Substring Matching | 2013 | SIGMOD | 4.6721808e-05 |
| 4,333 | An Efficient Index Structure for String Databases | 2001 | VLDB | 6.2805237e-05 |