An Evaluation of N-Gram Selection Strategies for Regular Expression Indexing in Contemporary Text Analysis Tasks
Summary: Systematic evaluation of three representative n-gram selection strategies for regex indexing across five contemporary workloads (production logs, genomics), measuring index build time, storage, false positives, and end-to-end query performance. Provides modern empirical trade-offs and releases an open-source unified benchmarking framework and implementations to guide scalable regex-index design. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Authors
Incoming Citations (Sorted by Pagerank)
Showing 1 of 1 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 10,106 | Regular Expression Indexing for Log Analysis | 2026 | SIGMOD | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 4 of 4 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 391 | Indexing and Querying XML Data for Regular Path Expressions | 2001 | VLDB | 0.00024564567 |
| 1,270 | BitWeaving: Fast Scans for Main Memory Data Processing | 2013 | SIGMOD | 0.00012926086 |
| 3,526 | RE-Tree: An Efficient Index Structure for Regular Expressions | 2002 | VLDB | 7.0078308e-05 |
| 9,826 | Exploiting Structure in Regular Expression Queries | 2023 | SIGMOD | 4.2751057e-05 |
Previous
Page 1 / 1
Next