Database Paper Browser

Back to papers

Hashed Samples: Selectivity Estimators For Set Similarity Selection Queries

Summary: Hashed Samples designs selectivity estimators for weighted set similarity queries (TF-IDF/BM25) using a priori constructed samples. It avoids uniform sampling pitfalls, proves accuracy theoretically, and delivers orders-of-magnitude speedups with small space overhead compared with exact solutions. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
9669
Venue
VLDB
Year
2008
Pagerank
8.1320575e-05
Overall Rank
2,779 | 80.67%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 19 of 19 citing papers.

Rank Citing Paper Year Venue Pagerank
1,204 VerdictDB: Universalizing Approximate Query Processing 2018 SIGMOD 0.00013319541
1,396 Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search 2012 SIGMOD 0.00012204748
2,592 Pass-Join: A Partition-based Method for Similarity Joins 2012 VLDB 8.4795761e-05
3,578 Efficient Approximate Entity Extraction with Edit Distance Constraints 2009 SIGMOD 6.9503858e-05
4,873 Power-Law Based Estimation of Set Similarity Join Size 2009 VLDB 5.8602304e-05
5,073 Faerie: Efficient Filtering Algorithms for Approximate Dictionary-based Entity Extraction 2011 SIGMOD 5.7177424e-05
5,220 Similarity Join Size Estimation using Locality Sensitive Hashing 2011 VLDB 5.6216111e-05
5,415 Coordinated Weighted Sampling for Estimating Aggregates Over Multiple Weight Assignments 2009 VLDB 5.5196338e-05
5,887 Efficient Approximate Search on String Collections (Tutorial) 2009 VLDB 5.2879769e-05
5,951 PGMJoins: Random Join Sampling with Graphical Models 2021 SIGMOD 5.2592385e-05
6,493 Joins on Samples: A Theoretical Guide for Practitioners 2020 VLDB 5.0424713e-05
7,645 Selectivity Estimation on Streaming Spatio-Textual Data Using Local Correlations 2015 VLDB 4.6896215e-05
8,921 Leveraging Similarity Joins for Signal Reconstruction 2018 VLDB 4.427232e-05
9,691 Selectivity Estimation for Queries Containing Predicates over Set-Valued Attributes 2023 SIGMOD 4.3035354e-05
10,590 ACE: A Cardinality Estimator for Set-Valued Queries 2025 VLDB 4.1945683e-05
11,504 LES3: Learning-based Exact Set Similarity Search 2021 VLDB 4.1945683e-05
11,533 Tanium Reveal: A Federated Search Engine for Querying Unstructured File Data on Large Enterprise Networks 2021 VLDB 4.1945683e-05
12,166 Get the Most out of Your Sample: Optimal Unbiased Estimators using Partial Information 2011 PODS 4.1945683e-05
13,287 Orca-SR: A Real-Time Traffic Engineering Framework leveraging Similarity Joins 2020 VLDB -
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 11 of 11 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers