Back to papers
A Pivotal Prefix Based Filtering Algorithm for String Similarity Search
Summary: Introduces a pivotal prefix filter for string similarity search under edit distance, drastically reducing signatures and boosting pruning power. A dynamic-programming method selects high-quality pivotal prefixes to prune non-consecutive errors, while an alignment filter prunes consecutive errors; experiments on real datasets show order-of-magnitude speedups over baselines.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 4805
- Venue
- SIGMOD
- Year
- 2014
- Pagerank
- 4.9484027e-05
- Overall Rank
- 6,726 | 53.21%
- DOI
-
10.1145/2588555.2593675
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 14 of 14 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 2,740 |
String Similarity Joins: An Experimental Evaluation |
2014 |
VLDB |
8.1980628e-05 |
| 4,250 |
Local Similarity Search for Unstructured Text |
2016 |
SIGMOD |
6.3241139e-05 |
| 4,353 |
Overlap Set Similarity Joins with Theoretical Guarantees |
2018 |
SIGMOD |
6.263585e-05 |
| 5,179 |
SilkMoth: An Efficient Method for Finding Related Sets with Maximum Matching Constraints |
2017 |
VLDB |
5.6428428e-05 |
| 5,291 |
Fast Subtrajectory Similarity Search in Road Networks under Weighted Edit Distance Constraints |
2020 |
VLDB |
5.5826473e-05 |
| 6,074 |
Pigeonring: A Principle for Faster Thresholded Similarity Search |
2019 |
VLDB |
5.2242306e-05 |
| 7,109 |
Efficient Similarity Join and Search on Multi-Attribute Data |
2015 |
SIGMOD |
4.8292998e-05 |
| 7,635 |
Allign: Aligning All-Pair Near-Duplicate Passages in Long Texts |
2021 |
SIGMOD |
4.6908858e-05 |
| 8,291 |
TxtAlign: Efficient Near-Duplicate Text Alignment Search via Bottom-k Sketches for Plagiarism Detection |
2022 |
SIGMOD |
4.5435639e-05 |
| 9,567 |
META: An Efficient Matching-Based Method for Error-Tolerant Autocompletion |
2016 |
VLDB |
4.3254416e-05 |
| 9,832 |
Balance-Aware Distributed String Similarity-Based Query Processing System |
2019 |
VLDB |
4.2751057e-05 |
| 9,876 |
Near-Duplicate Sequence Search at Scale for Large Language Model Memorization Evaluation |
2023 |
SIGMOD |
4.2667743e-05 |
| 9,932 |
Local Filtering: Improving the Performance of Approximate Queries on String Collections |
2015 |
SIGMOD |
4.2500258e-05 |
| 11,724 |
ZigZag: Supporting Similarity Queries on Vector Space Models |
2018 |
SIGMOD |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 15 of 15 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 125 |
Approximate String Joins in a Database (Almost) for Free |
2001 |
VLDB |
0.00044847972 |
| 155 |
Robust and Efficient Fuzzy Match for Online Data Cleaning |
2003 |
SIGMOD |
0.00040637896 |
| 266 |
Efficient Exact Set-Similarity Joins |
2006 |
VLDB |
0.00029718727 |
| 1,202 |
VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams |
2007 |
VLDB |
0.00013326298 |
| 1,234 |
Ed-Join: An Efficient Algorithm for Similarity Joins With Edit Distance Constraints |
2008 |
VLDB |
0.00013122499 |
| 1,396 |
Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search |
2012 |
SIGMOD |
0.00012204748 |
| 2,073 |
Extending Autocompletion To Tolerate Errors |
2009 |
SIGMOD |
9.6142791e-05 |
| 2,376 |
Bed-Tree: An All-Purpose Index Structure for String Similarity Search Based on Edit Distance |
2010 |
SIGMOD |
8.9424361e-05 |
| 2,592 |
Pass-Join: A Partition-based Method for Similarity Joins |
2012 |
VLDB |
8.4795761e-05 |
| 2,740 |
String Similarity Joins: An Experimental Evaluation |
2014 |
VLDB |
8.1980628e-05 |
| 3,774 |
Efficient Exact Edit Similarity Query Processing with the Asymmetric Signature Scheme |
2011 |
SIGMOD |
6.7757301e-05 |
| 4,216 |
Trie-Join: Efficient Trie-based String Similarity Joins with Edit-Distance Constraints |
2010 |
VLDB |
6.3521675e-05 |
| 4,988 |
Incremental Maintenance of Length Normalized Indexes for Approximate String Matching |
2009 |
SIGMOD |
5.783959e-05 |
| 7,141 |
Efficient Error-tolerant Query Autocompletion |
2013 |
VLDB |
4.8197901e-05 |
| 7,708 |
Efficient Top-k Algorithms for Approximate Substring Matching |
2013 |
SIGMOD |
4.6721808e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 2,592 |
Pass-Join: A Partition-based Method for Similarity Joins |
2012 |
VLDB |
8.4795761e-05 |
| 9,933 |
Efficient and Effective KNN Sequence Search with Approximate n-grams |
2014 |
VLDB |
4.2500258e-05 |
| 11,979 |
Similarity Joins for Uncertain Strings |
2014 |
SIGMOD |
4.1945683e-05 |
| 1,234 |
Ed-Join: An Efficient Algorithm for Similarity Joins With Edit Distance Constraints |
2008 |
VLDB |
0.00013122499 |
| 7,109 |
Efficient Similarity Join and Search on Multi-Attribute Data |
2015 |
SIGMOD |
4.8292998e-05 |
| 1,396 |
Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search |
2012 |
SIGMOD |
0.00012204748 |
| 4,216 |
Trie-Join: Efficient Trie-based String Similarity Joins with Edit-Distance Constraints |
2010 |
VLDB |
6.3521675e-05 |
| 9,932 |
Local Filtering: Improving the Performance of Approximate Queries on String Collections |
2015 |
SIGMOD |
4.2500258e-05 |
| 3,774 |
Efficient Exact Edit Similarity Query Processing with the Asymmetric Signature Scheme |
2011 |
SIGMOD |
6.7757301e-05 |
| 7,708 |
Efficient Top-k Algorithms for Approximate Substring Matching |
2013 |
SIGMOD |
4.6721808e-05 |