Back to papers
Why TPC Is Not Enough: An Analysis of the Amazon Redshift Fleet
Summary: Telemetry from 400 Amazon Redshift instances shows cloud data-warehouse workloads diverge from TPC-H/DS: prominent write-heavy pipelines, temporal variability in load and query types, repetitive queries, and heavy-tailed distributions of query/workload properties. Argues benchmarks must broaden beyond pure query-engine throughput to model ingestion, temporal dynamics, repetition and tail behavior, and releases a 3-month query-statistics dataset to seed more realistic benchmark design.
(summarized by gpt-5-mini on Feb 09 2026)
- Paper ID
- 13575
- Venue
- VLDB
- Year
- 2024
- Pagerank
- 7.4325992e-05
- Overall Rank
- 3,178 | 77.90%
- DOI
-
10.14778/3681954.3682031
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 25 of 25 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 5,915 |
Runtime-Extensible Parsers |
2025 |
CIDR |
5.274713e-05 |
| 6,685 |
How Good are Learned Cost Models, Really? Insights from Query Optimization Tasks |
2025 |
SIGMOD |
4.9627485e-05 |
| 8,207 |
SQLStorm: Taking Database Benchmarking into the LLM Era |
2025 |
VLDB |
4.5583637e-05 |
| 8,659 |
Learned Offline Query Planning via Bayesian Optimization |
2025 |
SIGMOD |
4.4722928e-05 |
| 8,718 |
Parachute: Single-Pass Bi-Directional Information Passing |
2025 |
VLDB |
4.4612599e-05 |
| 8,847 |
Towards Foundation Database Models |
2025 |
CIDR |
4.4371897e-05 |
| 8,884 |
Workload Insights From The Snowflake Data Cloud: What Do Production Analytic Queries Really Look Like? |
2025 |
VLDB |
4.4283999e-05 |
| 9,392 |
Demonstrating SQLBarber: Leveraging Large Language Models to Generate Customized and Realistic SQL Workloads |
2025 |
SIGMOD |
4.3441378e-05 |
| 9,587 |
Low Rank Learning for Offline Query Optimization |
2025 |
SIGMOD |
4.3215645e-05 |
| 9,973 |
End-to-End Declarative Data Analytics: Co-designing Engines, Interfaces, and Cloud Infrastructure |
2026 |
CIDR |
4.1945683e-05 |
| 9,981 |
Survivorship Bias in Industrial Database Workloads |
2026 |
CIDR |
4.1945683e-05 |
| 10,212 |
SQLBarber: A System Leveraging Large Language Models to Generate Customized and Realistic SQL Workloads |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,217 |
This is Going to Sound Crazy, But What If We Used Large Language Models to Boost Automatic Database Tuning Algorithms By Leveraging Prior History? We Will Find Better Configurations More Quickly Than Retraining From Scratch! |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,219 |
Practical Parameterized Query Optimization via Efficient Plan Reuse and List-wise Ranking |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,248 |
Active Data Lakes: Regaining Physical Data Independence Without Losing Interoperability |
2026 |
VLDB |
4.1945683e-05 |
| 10,368 |
B-Trees Are Back: Engineering Fast and Pageable Node Layouts |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,405 |
Flux: Unifying Heterogeneous Infrastructure for Alibaba AnalyticDB |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,707 |
PBench: Workload Synthesizer with Real Statistics for Cloud Analytics Benchmarking |
2025 |
VLDB |
4.1945683e-05 |
| 10,726 |
Improving DBMS Scheduling Decisions with Accurate Performance Prediction on Concurrent Queries |
2025 |
VLDB |
4.1945683e-05 |
| 10,767 |
The HANA Native Query Engine for Lakehouse Systems |
2025 |
VLDB |
4.1945683e-05 |
| 10,847 |
Sampling-based Predictive Database Buffer Management |
2025 |
VLDB |
4.1945683e-05 |
| 10,850 |
Mayura: Exploiting Similarities in Motifs for Temporal Co-Mining |
2025 |
VLDB |
4.1945683e-05 |
| 10,852 |
CloudGlide: Deconstructing the Landscape of Cloud-Based Analytics |
2025 |
VLDB |
4.1945683e-05 |
| 10,854 |
LiquidCache: Efficient Pushdown Caching for Cloud-Native Data Analytics |
2025 |
VLDB |
4.1945683e-05 |
| 10,859 |
Graph Transformers for Query Plan Representation: Potentials and Challenges |
2025 |
VLDB |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 13 of 13 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 629 |
Preventing Bad Plans by Bounding the Impact of Cardinality Estimation Errors |
2009 |
VLDB |
0.00018942366 |
| 1,284 |
Amazon Redshift Re-invented |
2022 |
SIGMOD |
0.00012837822 |
| 1,889 |
Tsunami: A Learned Multi-dimensional Index for Correlated Data and Skewed Workloads |
2021 |
VLDB |
0.00010200865 |
| 2,568 |
Towards Cost-Optimal Query Processing in the Cloud |
2021 |
VLDB |
8.5239227e-05 |
| 2,965 |
SQLShare: Results from a Multi-Year SQL-as-a-Service Experiment |
2016 |
SIGMOD |
7.8059273e-05 |
| 3,437 |
Speculative Distributed CSV Data Parsing for Big Data Analytics |
2019 |
SIGMOD |
7.0942161e-05 |
| 3,753 |
Choosing A Cloud DBMS: Architectures and Tradeoffs |
2019 |
VLDB |
6.7871241e-05 |
| 4,593 |
Auto-WLM: Machine Learning Enhanced Workload Management in Amazon Redshift |
2023 |
SIGMOD |
6.0606891e-05 |
| 4,717 |
Cloud Analytics Benchmark |
2023 |
VLDB |
5.9751539e-05 |
| 5,634 |
Intelligent Scaling in Amazon Redshift |
2024 |
SIGMOD |
5.4000904e-05 |
| 6,972 |
Predicate Caching: Query-Driven Secondary Indexing for Cloud Data Warehouses |
2024 |
SIGMOD |
4.8785237e-05 |
| 8,225 |
Automated Multidimensional Data Layouts in Amazon Redshift |
2024 |
SIGMOD |
4.555289e-05 |
| 8,442 |
SageDB: An Instance-Optimized Data Analytics System |
2022 |
VLDB |
4.5120602e-05 |
Semantically Similar Papers