Database Paper Browser

Back to papers

Random Sampling for Histogram Construction: How much is enough?

Summary: Optimal sampling bounds for equi-height histograms; region-sensitive error metric; adaptive page-level sampling leveraging value clustering. Distinct-value estimation is hard; practical estimator for optimizers; SQL Server 7.0 prototype confirms accuracy. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
3034
Venue
SIGMOD
Year
1998
Pagerank
0.00020803682
Overall Rank
530 | 96.32%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 41 of 41 citing papers.

Rank Citing Paper Year Venue Pagerank
34 Similarity Search in High Dimensions via Hashing 1999 VLDB 0.00076637636
43 Models and Issues in Data Stream Systems 2002 PODS 0.00072723062
126 Space-Efficient Online Computation of Quantile Summaries 2001 SIGMOD 0.00044744986
308 Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports 2001 VLDB 0.00028142852
325 The History of Histograms (abridged) 2003 VLDB 0.00027378328
443 Random Sampling Techniques for Space Efficient Online Computation of Order Statistics of Large Datasets 1999 SIGMOD 0.00022996573
449 Approximate Query Processing: Taming the TeraBytes! A Tutorial 2001 VLDB 0.00022846068
454 An Overview of Query Optimization in Relational Systems 1998 PODS 0.00022734812
516 AutoAdmin "What-if" Index Analysis Utility 1998 SIGMOD 0.00021196031
759 To Search or to Crawl? Towards a Query Optimizer for Text-Centric Tasks 2006 SIGMOD 0.00017064615
943 Wander Join: Online Aggregation via Random Walks 2016 SIGMOD 0.00015145883
1,323 Quickr: Lazily Approximating Complex AdHoc Queries in BigData Clusters 2016 SIGMOD 0.00012601997
1,574 Approximate Query Processing: No Silver Bullet 2017 SIGMOD 0.00011287495
1,695 Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-Size Estimation 1999 VLDB 0.00010882793
1,797 Effective Use of Block-Level Sampling in Statistics Estimation 2004 SIGMOD 0.00010523169
2,282 Summarizing and Mining Inverse Distributions on Data Streams via Dynamic Inverse Sampling 2005 VLDB 9.1073603e-05
3,013 Cardinality Estimation Using Sample Views with Quality Assurance 2007 SIGMOD 7.7137441e-05
3,050 Comparing Data Streams Using Hamming Norms (How to Zero In) 2002 VLDB 7.6512619e-05
3,310 Optimal and Approximate Computation of Summary Statistics for Range Aggregates 2001 PODS 7.2408955e-05
3,878 Data Canopy: Accelerating Exploratory Statistical Analysis 2017 SIGMOD 6.6731435e-05
4,031 Approximate Quantiles and the Order of the Stream 2006 PODS 6.5121141e-05
4,681 Adaptive Sampling for Rapidly Matching Histograms 2018 VLDB 6.0034918e-05
5,082 A Comparison of Selectivity Estimators for Range Queries on Metric Attributes 1999 SIGMOD 5.711623e-05
5,113 Columnstore and B+ tree – Are Hybrid Physical Designs Important? 2018 SIGMOD 5.687445e-05
5,150 Efficient Join Synopsis Maintenance for Data Warehouse 2020 SIGMOD 5.6626586e-05
5,457 Fast and Approximate Stream Mining of Quantiles and Frequencies Using Graphics Processors 2005 SIGMOD 5.4970777e-05
5,632 Bloom Histogram: Path Selectivity Estimation for XML Data with Updates 2004 VLDB 5.4014372e-05
5,688 PREDIcT: Towards Predicting the Runtime of Large Scale Iterative Analytics 2013 VLDB 5.3702808e-05
5,879 Fast and Near–Optimal Algorithms for Approximating Distributions by Histograms 2015 PODS 5.2908101e-05
6,170 PolarDB-IMCI: A Cloud-Native HTAP Database System at Alibaba 2023 SIGMOD 5.171601e-05
6,444 Evaluating Interactive Data Systems: Workloads, Metrics, and Guidelines 2018 SIGMOD 5.059132e-05
6,548 Query Sampling in DB2 Universal Database 2004 SIGMOD 5.0181595e-05
6,637 Approximating and Testing k-Histogram Distributions in Sub-linear Time 2012 PODS 4.9816401e-05
7,358 Weighted Distinct Sampling: Cardinality Estimation for SPJ Queries 2021 SIGMOD 4.7529363e-05
8,443 Histograms as a Side Effect of Data Movement for Big Data 2014 SIGMOD 4.5119257e-05
8,610 Efficient Dynamic Weighted Set Sampling and Its Extension 2024 VLDB 4.4853485e-05
9,162 Estimating Quantiles from the Union of Historical and Streaming Data 2017 VLDB 4.3849295e-05
10,215 Task Cascades for Efficient Unstructured Data Processing 2026 SIGMOD 4.1945683e-05
10,498 PLM4NDV: Minimizing Data Access for Number of Distinct Values Estimation with Pre-trained Language Models 2025 SIGMOD 4.1945683e-05
10,534 AdaNDV: Adaptive Number of Distinct Value Estimation via Learning to Select and Fuse Estimators 2025 VLDB 4.1945683e-05
11,821 Are Few Bins Enough: Testing Histogram Distributions 2016 PODS 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 13 of 13 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers