Selectivity Estimation for Fuzzy String Predicates in Large Data Sets
Summary: Proposes Sepia, a histogram-based selectivity estimator for fuzzy string predicates. It clusters strings, builds per-cluster and global histograms, and uses a pivot to propagate q–s similarity via edit distance; extensible to other similarity measures and robust to nonuniform errors. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
Incoming Citations (Sorted by Pagerank)
Showing 10 of 10 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 14 of 14 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 4,442 | Approximating Predicates and Expressive Queries on Probabilistic Databases | 2008 | PODS | 6.186154e-05 |
| 4,026 | Flexible String Matching Against Large Databases in Practice | 2004 | VLDB | 6.5169976e-05 |
| 372 | Selectivity Estimation using Probabilistic Models | 2001 | SIGMOD | 0.00025354779 |
| 11,979 | Similarity Joins for Uncertain Strings | 2014 | SIGMOD | 4.1945683e-05 |
| 4,901 | Probabilistic String Similarity Joins | 2010 | SIGMOD | 5.8411648e-05 |
| 64 | Improved Histograms for Selectivity Estimation of Range Predicates | 1996 | SIGMOD | 0.00063612837 |
| 5,082 | A Comparison of Selectivity Estimators for Range Queries on Metric Attributes | 1999 | SIGMOD | 5.711623e-05 |
| 2,171 | Selectivity Estimation For Boolean Queries | 2000 | PODS | 9.3807165e-05 |
| 897 | Selectivity Estimation and Query Optimization in Large Databases with Highly Skewed Distributions of Column Values | 1988 | VLDB | 0.00015528028 |
| 3,226 | Extending Q-Grams to Estimate Selectivity of String Matching with Low Edit Distance | 2007 | VLDB | 7.3433307e-05 |