Selectivity Estimation For Boolean Queries
Summary: Proposes compact Monte Carlo set-hash signatures that represent the set of strings containing each substring, enabling on-the-fly estimation of correlations among substring predicates to answer arbitrary Boolean substring queries. Space-efficient, approximate method scales to super-exponential predicate combinations and empirically outperforms independence-based selectivity estimates for IR/query-optimization and query-refinement tasks. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Zhiyuan Chen
- 2. Flip Korn
- 3. Nick Koudas
- 4. S. Muthukrishnan
Incoming Citations (Sorted by Pagerank)
Showing 14 of 14 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 4 of 4 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 326 | Optimal Histograms with Quality Guarantees | 1998 | VLDB | 0.00027358981 |
| 1,146 | Estimating Alphanumeric Selectivity in the Presence of Wildcards | 1996 | SIGMOD | 0.00013679782 |
| 1,379 | Substring Selectivity Estimation | 1999 | PODS | 0.00012286879 |
| 3,035 | Multi-Dimensional Substring Selectivity Estimation | 1999 | VLDB | 7.6748073e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3,708 | Is Min-Wise Hashing Optimal for Summarizing Set Intersection? | 2014 | PODS | 6.8247903e-05 |
| 372 | Selectivity Estimation using Probabilistic Models | 2001 | SIGMOD | 0.00025354779 |
| 9,691 | Selectivity Estimation for Queries Containing Predicates over Set-Valued Attributes | 2023 | SIGMOD | 4.3035354e-05 |
| 1,146 | Estimating Alphanumeric Selectivity in the Presence of Wildcards | 1996 | SIGMOD | 0.00013679782 |
| 4,438 | Selectivity Estimation for Fuzzy String Predicates in Large Data Sets | 2005 | VLDB | 6.1898903e-05 |
| 1,981 | Improved Selectivity Estimation by Combining Knowledge from Sampling and Synopses | 2018 | VLDB | 9.8687545e-05 |
| 5,813 | Space-efficient Substring Occurrence Estimation | 2011 | PODS | 5.3170565e-05 |
| 3,035 | Multi-Dimensional Substring Selectivity Estimation | 1999 | VLDB | 7.6748073e-05 |
| 1,379 | Substring Selectivity Estimation | 1999 | PODS | 0.00012286879 |
| 2,779 | Hashed Samples: Selectivity Estimators For Set Similarity Selection Queries | 2008 | VLDB | 8.1320575e-05 |