Compact Summaries over Large Datasets
Summary: Tutorial on designing and analyzing compact, mergeable summaries that capture essential dataset properties for single‑machine storage and querying after distributed parallel construction. Covers count‑distinct estimators and linear sketches for norms/inner products, space/error tradeoffs, mergeability, and efficient algorithms to compute and combine them. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Authors
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 4 of 4 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 109 | Dremel: Interactive Analysis of Web-Scale Datasets | 2010 | VLDB | 0.00048186983 |
| 126 | Space-Efficient Online Computation of Quantile Summaries | 2001 | SIGMOD | 0.00044744986 |
| 402 | Mergeable Summaries | 2012 | PODS | 0.00024196343 |
| 3,486 | Holistic UDAFs at Streaming Speeds | 2004 | SIGMOD | 7.0502199e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 435 | Efficient Aggregation for Graph Summarization | 2008 | SIGMOD | 0.00023260172 |
| 12,344 | Composable, Scalable, and Accurate Weight Summarization of Unaggregated Data Sets | 2009 | VLDB | 4.1945683e-05 |
| 6,244 | Approximate Distinct Counts for Billions of Datasets | 2019 | SIGMOD | 5.139669e-05 |
| 13,667 | Offline and Data Stream algorithms for efficient computation of synopsis structures | 2005 | VLDB | - |
| 5,968 | Summarizing Static and Dynamic Big Graphs | 2017 | VLDB | 5.2503253e-05 |
| 3,991 | Beyond Simple Aggregates: Indexing for Summary Queries | 2011 | PODS | 6.5553055e-05 |
| 8,605 | Structure-Aware Sampling: Flexible and Accurate Summarization | 2011 | VLDB | 4.4865144e-05 |
| 6,368 | Pre-training Summarization Models of Structured Datasets for Cardinality Estimation | 2022 | VLDB | 5.0937722e-05 |
| 3,536 | General purpose database summarization | 2005 | VLDB | 6.9990821e-05 |
| 10,927 | Computing A Well-Representative Summary of Conjunctive Query Results | 2024 | PODS | 4.1945683e-05 |