Hashed Samples: Selectivity Estimators For Set Similarity Selection Queries
Summary: Hashed Samples designs selectivity estimators for weighted set similarity queries (TF-IDF/BM25) using a priori constructed samples. It avoids uniform sampling pitfalls, proves accuracy theoretically, and delivers orders-of-magnitude speedups with small space overhead compared with exact solutions. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
Incoming Citations (Sorted by Pagerank)
Showing 19 of 19 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 11 of 11 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 250 | Efficient set joins on similarity predicates | 2004 | SIGMOD | 0.00030661988 |
| 10,227 | Sample-based Distinct Cardinality Estimation for Multiple Attributes in Multi-Dataset Queries | 2026 | VLDB | 4.1945683e-05 |
| 1,241 | Multi-dimensional Selectivity Estimation Using Compressed Histogram Information | 1999 | SIGMOD | 0.00013097578 |
| 7,771 | Modeling High-Dimensional Index Structures using Sampling | 2001 | SIGMOD | 4.6560482e-05 |
| 4,278 | Similarity Query Processing for High-Dimensional Data | 2020 | VLDB | 6.2953764e-05 |
| 9,380 | Small Selectivities Matter: Lifting the Burden of Empty Samples | 2021 | SIGMOD | 4.3461329e-05 |
| 5,220 | Similarity Join Size Estimation using Locality Sensitive Hashing | 2011 | VLDB | 5.6216111e-05 |
| 1,981 | Improved Selectivity Estimation by Combining Knowledge from Sampling and Synopses | 2018 | VLDB | 9.8687545e-05 |
| 7,522 | Efficient and Tunable Similar Set Retrieval | 2001 | SIGMOD | 4.7180617e-05 |
| 2,171 | Selectivity Estimation For Boolean Queries | 2000 | PODS | 9.3807165e-05 |