Composable, Scalable, and Accurate Weight Summarization of Unaggregated Data Sets
Summary: Composable, scalable weight summarization for unaggregated data via a sampling-aggregation framework. Unbiased estimates for subpopulations under arbitrary predicates; no variance-optimal scheme exists, but variance improves with more aggregation; experiments beat prior methods on streams and distributed data. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Authors
- 1. Edith Cohen
- 2. Nick Duffield
- 3. Haim Kaplan
- 4. Carsten Lund
- 5. Mikkel Thorup
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 5 of 5 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 18 | On Random Sampling over Joins | 1999 | SIGMOD | 0.00092385438 |
| 184 | New Sampling-Based Summary Statistics for Improving Approximate Query Answers | 1998 | SIGMOD | 0.00036625711 |
| 3,928 | Tighter Estimation using Bottom-k Sketches | 2008 | VLDB | 6.6254568e-05 |
| 5,117 | Sampling Algorithms in a Stream Operator | 2005 | SIGMOD | 5.6825418e-05 |
| 7,547 | Sketching Unaggregated Data Streams for Subpopulation-Size Queries | 2007 | PODS | 4.7144329e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 8,452 | On the algebra of data sketches | 2021 | VLDB | 4.5086031e-05 |
| 3,385 | Estimating Statistical Aggregates on Probabilistic Data Streams | 2007 | PODS | 7.1580391e-05 |
| 8,605 | Structure-Aware Sampling: Flexible and Accurate Summarization | 2011 | VLDB | 4.4865144e-05 |
| 12,166 | Get the Most out of Your Sample: Optimal Unbiased Estimators using Partial Information | 2011 | PODS | 4.1945683e-05 |
| 8,365 | Efficient Summarization Framework for Multi-Attribute Uncertain Data | 2014 | SIGMOD | 4.5357797e-05 |
| 12,108 | Space-Efficient Estimation of Statistics over Sub-Sampled Streams | 2012 | PODS | 4.1945683e-05 |
| 3,928 | Tighter Estimation using Bottom-k Sketches | 2008 | VLDB | 6.6254568e-05 |
| 7,547 | Sketching Unaggregated Data Streams for Subpopulation-Size Queries | 2007 | PODS | 4.7144329e-05 |
| 3,271 | Data Sketches for Disaggregated Subset Sum and Frequent Item Estimation | 2018 | SIGMOD | 7.2968732e-05 |
| 5,415 | Coordinated Weighted Sampling for Estimating Aggregates Over Multiple Weight Assignments | 2009 | VLDB | 5.5196338e-05 |