Database Paper Browser

Back to papers

On Synopses for Distinct-Value Estimation Under Multiset Operations

Summary: DV estimation via scalable synopsis store: partition synopses computed in parallel, combinable for unions, intersections, and differences. Order-statistics-based estimators are unbiased; a Cohen-driven limit theorem sizes synopses, reducing cost and boosting accuracy. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
3851
Venue
SIGMOD
Year
2007
Pagerank
0.00017508726
Overall Rank
727 | 94.95%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 34 of 34 citing papers.

Rank Citing Paper Year Venue Pagerank
383 An Optimal Algorithm for the Distinct Elements Problem 2010 PODS 0.00024820873
1,323 Quickr: Lazily Approximating Complex AdHoc Queries in BigData Clusters 2016 SIGMOD 0.00012601997
1,664 On Multi-Column Foreign Key Discovery 2010 VLDB 0.00010976887
1,683 Cardinality Estimation: An Experimental Survey 2018 VLDB 0.00010922679
1,967 Compressed Linear Algebra for Large-Scale Machine Learning 2016 VLDB 9.9131712e-05
2,184 A Sample-and-Clean Framework for Fast and Accurate Query Processing on Dirty Data 2014 SIGMOD 9.3429789e-05
2,444 Brighthouse: An Analytic Data Warehouse for Ad-hoc Queries 2008 VLDB 8.8076551e-05
2,779 Hashed Samples: Selectivity Estimators For Set Similarity Selection Queries 2008 VLDB 8.1320575e-05
2,878 Sampling Time-Based Sliding Windows in Bounded Space 2008 SIGMOD 7.9706235e-05
3,256 Multidimensional Content eXploration 2008 VLDB 7.3158557e-05
3,824 Correlation Sketches for Approximate Join-Correlation Queries 2021 SIGMOD 6.7260705e-05
3,928 Tighter Estimation using Bottom-k Sketches 2008 VLDB 6.6254568e-05
3,991 Beyond Simple Aggregates: Indexing for Summary Queries 2011 PODS 6.5553055e-05
4,833 MNC: Structure-Exploiting Sparsity Estimation for Matrix Expressions 2019 SIGMOD 5.8916346e-05
4,905 Randomized Error Removal for Online Spread Estimation in Data Streaming 2021 VLDB 5.8398332e-05
5,014 Dynamically Optimizing Queries over Large Scale Data Platforms 2014 SIGMOD 5.7586174e-05
5,361 Efficient Estimation of Inclusion Coefficient using HyperLogLog Sketches 2018 VLDB 5.547935e-05
5,415 Coordinated Weighted Sampling for Estimating Aggregates Over Multiple Weight Assignments 2009 VLDB 5.5196338e-05
6,368 Pre-training Summarization Models of Structured Datasets for Cardinality Estimation 2022 VLDB 5.0937722e-05
7,122 Parallel Algorithms for Sparse Matrix Multiplication and Join-Aggregate Queries 2020 PODS 4.8252188e-05
7,358 Weighted Distinct Sampling: Cardinality Estimation for SPJ Queries 2021 SIGMOD 4.7529363e-05
7,415 Efficient and Scalable Statistics Gathering for Large Databases in Oracle 11g 2008 SIGMOD 4.7355557e-05
7,430 Adaptive Log Compression for Massive Log Data 2013 SIGMOD 4.7317713e-05
7,645 Selectivity Estimation on Streaming Spatio-Textual Data Using Local Correlations 2015 VLDB 4.6896215e-05
8,514 UPLIFT: Parallelization Strategies for Feature Transformations in Machine Learning Workloads 2022 VLDB 4.4944285e-05
9,187 POLAR: Adaptive and Non-invasive Join Order Selection via Plans of Least Resistance 2024 VLDB 4.3780059e-05
10,358 Robust Statistical Analysis on Streaming Data with Near-Duplicates in General Metric Spaces 2025 PODS 4.1945683e-05
10,498 PLM4NDV: Minimizing Data Access for Number of Distinct Values Estimation with Pre-trained Language Models 2025 SIGMOD 4.1945683e-05
10,639 Cardinality Estimation for Having-Clauses 2025 VLDB 4.1945683e-05
11,025 Sampling Methods for Inner Product Sketching 2024 VLDB 4.1945683e-05
11,168 Weighted Minwise Hashing Beats Linear Sketching for Inner Product Estimation 2023 PODS 4.1945683e-05
11,414 No Repetition: Fast and Reliable Sampling with Highly Concentrated Hashing 2022 VLDB 4.1945683e-05
11,833 Streaming Algorithms for Robust Distinct Elements 2016 SIGMOD 4.1945683e-05
12,166 Get the Most out of Your Sample: Optimal Unbiased Estimators using Partial Information 2011 PODS 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 7 of 7 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers