Database Paper Browser

Back to papers

Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports

Summary: Distinct sampling: single-pass, tailored sampling over distinct values yields accurate estimates from small samples, breaking prior negative results. Incrementally maintained under updates, it supports range queries and yields 0-10% error with 2-4x speedups in high-volume reporting. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
8778
Venue
VLDB
Year
2001
Pagerank
0.00028142852
Overall Rank
308 | 97.86%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 46 of 46 citing papers.

Rank Citing Paper Year Venue Pagerank
383 An Optimal Algorithm for the Distinct Elements Problem 2010 PODS 0.00024820873
449 Approximate Query Processing: Taming the TeraBytes! A Tutorial 2001 VLDB 0.00022846068
475 Mining Database Structure; Or, How to Build a Data Quality Browser 2002 SIGMOD 0.00022303253
477 Model-Driven Data Acquisition in Sensor Networks 2004 VLDB 0.00022221803
629 Preventing Bad Plans by Bounding the Impact of Cardinality Estimation Errors 2009 VLDB 0.00018942366
727 On Synopses for Distinct-Value Estimation Under Multiset Operations 2007 SIGMOD 0.00017508726
956 How to Summarize the Universe: Dynamic Maintenance of Quantiles 2002 VLDB 0.00015066967
1,392 Sketching Streams Through the Net: Distributed Approximate Query Tracking 2005 VLDB 0.00012229045
1,472 Space Efficient Mining of Multigraph Streams 2005 PODS 0.00011828662
1,683 Cardinality Estimation: An Experimental Survey 2018 VLDB 0.00010922679
2,011 Rapid Sampling for Visualizations with Ordering Guarantees 2015 VLDB 9.7964875e-05
2,118 Using Probabilistic Models for Data Management in Acquisitional Environments 2005 CIDR 9.5100739e-05
2,282 Summarizing and Mining Inverse Distributions on Data Streams via Dynamic Inverse Sampling 2005 VLDB 9.1073603e-05
2,837 Correlation Maps: A Compressed Access Method for Exploiting Soft Functional Dependencies 2009 VLDB 8.0414149e-05
2,931 Holistic Aggregates in a Networked World: Distributed Tracking of Approximate Quantiles 2005 SIGMOD 7.8697258e-05
3,050 Comparing Data Streams Using Hamming Norms (How to Zero In) 2002 VLDB 7.6512619e-05
3,102 Processing Set Expressions over Continuous Update Streams 2003 SIGMOD 7.5586568e-05
3,486 Holistic UDAFs at Streaming Speeds 2004 SIGMOD 7.0502199e-05
3,558 Approximate Selection with Guarantees using Proxies 2020 VLDB 6.9765724e-05
3,702 Every Row Counts: Combining Sketches and Sampling for Accurate Group-By Result Estimates 2019 CIDR 6.8295759e-05
3,867 CORADD: Correlation Aware Database Designer for Materialized Views and Indexes 2010 VLDB 6.683173e-05
4,014 Exploiting Correlations for Expensive Predicate Evaluation 2015 SIGMOD 6.5273084e-05
4,350 On Biased Reservoir Sampling in the Presence of Stream Evolution 2006 VLDB 6.2645054e-05
4,718 Weighted Reservoir Sampling from Distributed Streams 2019 PODS 5.9749691e-05
5,117 Sampling Algorithms in a Stream Operator 2005 SIGMOD 5.6825418e-05
5,264 SeeDB: Visualizing Database Queries Efficiently 2014 VLDB 5.597302e-05
5,415 Coordinated Weighted Sampling for Estimating Aggregates Over Multiple Weight Assignments 2009 VLDB 5.5196338e-05
5,673 Distributed Set-Expression Cardinality Estimation 2004 VLDB 5.3780919e-05
5,783 Extended Wavelets for Multiple Measures 2003 SIGMOD 5.3289633e-05
5,905 Exploiting Ordered Dictionaries to Efficiently Construct Histograms with Q-Error Guarantees in SAP HANA 2014 SIGMOD 5.2788785e-05
6,190 Maintaining Bernoulli Samples over Evolving Multisets 2007 PODS 5.1645517e-05
6,374 Optimization of Conjunctive Predicates for Main Memory Column Stores 2016 VLDB 5.0927058e-05
6,838 Capturing Data Uncertainty in High-Volume Stream Processing 2009 CIDR 4.9109732e-05
7,358 Weighted Distinct Sampling: Cardinality Estimation for SPJ Queries 2021 SIGMOD 4.7529363e-05
7,415 Efficient and Scalable Statistics Gathering for Large Databases in Oracle 11g 2008 SIGMOD 4.7355557e-05
7,699 Sketch-based Geometric Monitoring of Distributed Stream Queries 2013 VLDB 4.6746076e-05
7,834 Sketch-based Querying of Distributed Sliding-Window Data Streams 2012 VLDB 4.6382551e-05
8,350 alpha to omega: The Greek Alphabet of Sampling 2020 CIDR 4.5404832e-05
8,893 Histograms Reloaded: The Merits of Bucket Diversity 2010 SIGMOD 4.4275272e-05
9,038 OmniSketch: Efficient Multi-Dimensional High-Velocity Stream Analytics with Arbitrary Predicates 2024 VLDB 4.4039656e-05
10,227 Sample-based Distinct Cardinality Estimation for Multiple Attributes in Multi-Dataset Queries 2026 VLDB 4.1945683e-05
12,060 Statistics Collection in Oracle Spatial and Graph: Fast Histogram Construction for Complex Geometry Objects 2013 VLDB 4.1945683e-05
12,166 Get the Most out of Your Sample: Optimal Unbiased Estimators using Partial Information 2011 PODS 4.1945683e-05
12,475 A Simple and Efficient Estimation Method for Stream Expression Cardinalities 2007 VLDB 4.1945683e-05
12,478 Randomized Algorithms for Data Reconciliation in Wide Area Aggregate Query Processing 2007 VLDB 4.1945683e-05
12,531 Join-Distinct Aggregate Estimation over Update Streams 2005 PODS 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 19 of 19 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
14 Online Aggregation 1997 SIGMOD 0.0010801504
39 Statistical Estimators for Relational Algebra Expressions 1988 PODS 0.00074745564
59 Sampling-Based Estimation of the Number of Distinct Values of an Attribute 1995 VLDB 0.00064501896
64 Improved Histograms for Selectivity Estimation of Range Predicates 1996 SIGMOD 0.00063612837
134 Processing Aggregate Relational Queries with Hard Time Constraints 1989 SIGMOD 0.00042452811
184 New Sampling-Based Summary Statistics for Improving Approximate Query Answers 1998 SIGMOD 0.00036625711
211 Join Synopses for Approximate Query Answering 1999 SIGMOD 0.00033981214
217 Ripple Joins for Online Aggregation 1999 SIGMOD 0.00033536712
269 Fast Incremental Maintenance of Approximate Histograms 1997 VLDB 0.00029656549
273 Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets 1999 SIGMOD 0.00029390945
315 Error-Constrained COUNT Query Evaluation in Relational Databases 1991 SIGMOD 0.0002802103
378 Towards Estimation Error Guarantees for Distinct Values 2000 PODS 0.0002497492
405 Approximate Query Processing Using Wavelets 2000 VLDB 0.00024057494
429 The Aqua Approximate Query Answering System 1999 SIGMOD 0.00023476494
530 Random Sampling for Histogram Construction: How much is enough? 1998 SIGMOD 0.00020803682
647 Progressive Approximate Aggregate Queries with a Multi-Resolution Tree Structure 2001 SIGMOD 0.00018668224
739 Congressional Samples for Approximate Answering of Group-By Queries 2000 SIGMOD 0.00017401518
1,335 ICICLES: Self-tuning Samples for Approximate Query Answering 2000 VLDB 0.00012502131
2,808 A Robust, Optimization-Based Approach for Approximate Answering of Aggregate Queries 2001 SIGMOD 8.0870741e-05
Previous Page 1 / 1 Next

Semantically Similar Papers