Back to papers
KEA: Tuning an Exabyte-Scale Data Infrastructure
Summary: KEA automates tuning of exabyte-scale data infra with ML models from telemetry, using observational tuning and cautious production flighting. First study addressing exabyte-scale data-management tuning, with potential tens of millions in annual savings.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 6258
- Venue
- SIGMOD
- Year
- 2021
- Pagerank
- 4.9372134e-05
- Overall Rank
- 6,757 | 53.00%
- DOI
-
10.1145/3448016.3457569
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 14 of 14 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 3,429 |
Real-time Workload Pattern Analysis for Large-scale Cloud Databases |
2023 |
VLDB |
7.1010535e-05 |
| 6,110 |
Doppler: Automated SKU Recommendation in Migrating SQL Workloads to the Cloud |
2022 |
VLDB |
5.2056003e-05 |
| 6,261 |
The Cosmos Big Data Platform at Microsoft: Over a Decade of Progress and a Decade to Look Forward |
2021 |
VLDB |
5.1350714e-05 |
| 6,871 |
Towards General and Efficient Online Tuning for Spark |
2023 |
VLDB |
4.8997004e-05 |
| 7,655 |
Machine Learning for Cloud Data Systems: the Progress so far and the Path Forward |
2021 |
VLDB |
4.6872456e-05 |
| 7,778 |
Runtime Variation in Big Data Analytics |
2023 |
SIGMOD |
4.653651e-05 |
| 8,416 |
Towards Building Autonomous Data Services on Azure |
2023 |
SIGMOD |
4.5196199e-05 |
| 8,783 |
GEqO: ML-Accelerated Semantic Equivalence Detection |
2023 |
SIGMOD |
4.452825e-05 |
| 8,854 |
Optimizing the cloud? Don't train models. Build oracles! |
2024 |
CIDR |
4.4349047e-05 |
| 9,074 |
Making Data Clouds Smarter at Keebo: Automated Warehouse Optimization using Data Learning |
2023 |
SIGMOD |
4.402065e-05 |
| 9,190 |
MLOS in Action: Bridging the Gap Between Experimentation and Auto-Tuning in the Cloud |
2024 |
VLDB |
4.3768215e-05 |
| 9,689 |
LST-Bench: Benchmarking Log-Structured Tables in the Cloud |
2024 |
SIGMOD |
4.3043822e-05 |
| 10,966 |
Lorentz: Learned SKU Recommendation Using Profile Data (DMDS) |
2024 |
SIGMOD |
4.1945683e-05 |
| 11,011 |
Intelligent Pooling: Proactive Resource Provisioning in Large-scale Cloud Service |
2024 |
VLDB |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 13 of 13 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 22 |
SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets |
2008 |
VLDB |
0.0008456613 |
| 70 |
Hive - A Warehousing Solution Over a Map-Reduce Framework |
2009 |
VLDB |
0.00059533166 |
| 183 |
Automatic Database Management System Tuning Through Large-scale Machine Learning |
2017 |
SIGMOD |
0.00036721403 |
| 514 |
An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning |
2019 |
SIGMOD |
0.0002124895 |
| 868 |
Profiling, What-if Analysis, and Cost-based Optimization of MapReduce Programs |
2011 |
VLDB |
0.00015789681 |
| 1,071 |
Starfish: A Self-tuning System for Big Data Analytics |
2011 |
CIDR |
0.00014312777 |
| 2,817 |
Recurring Job Optimization in Scope |
2012 |
SIGMOD |
8.0677653e-05 |
| 3,038 |
Azure Data Lake Store: A Hyperscale Distributed File Service for Big Data Analytics |
2017 |
SIGMOD |
7.6717218e-05 |
| 4,015 |
Take me to your leader! Online Optimization of Distributed Storage Configurations |
2015 |
VLDB |
6.5272549e-05 |
| 4,061 |
Advanced Partitioning Techniques for Massively Distributed Computation |
2012 |
SIGMOD |
6.483587e-05 |
| 4,248 |
Hyper Dimension Shuffle: Efficient Data Repartition at Petabyte Scale in SCOPE |
2019 |
VLDB |
6.3247927e-05 |
| 5,297 |
Continuous Cloud-Scale Query Optimization and Processing |
2013 |
VLDB |
5.5801669e-05 |
| 7,067 |
JetScope: Reliable and Interactive Analytics at Cloud Scale |
2015 |
VLDB |
4.8440936e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 9,074 |
Making Data Clouds Smarter at Keebo: Automated Warehouse Optimization using Data Learning |
2023 |
SIGMOD |
4.402065e-05 |
| 6,261 |
The Cosmos Big Data Platform at Microsoft: Over a Decade of Progress and a Decade to Look Forward |
2021 |
VLDB |
5.1350714e-05 |
| 4,380 |
LlamaTune: Sample-Efficient DBMS Configuration Tuning |
2022 |
VLDB |
6.2396606e-05 |
| 6,268 |
Speedup Your Analytics: Automatic Parameter Tuning for Databases and Big Data Systems |
2019 |
VLDB |
5.133857e-05 |
| 6,456 |
From Auto-tuning One Size Fits All to Self-designed and Learned Data-intensive Systems |
2019 |
SIGMOD |
5.0564619e-05 |
| 7,778 |
Runtime Variation in Big Data Analytics |
2023 |
SIGMOD |
4.653651e-05 |
| 5,297 |
Continuous Cloud-Scale Query Optimization and Processing |
2013 |
VLDB |
5.5801669e-05 |
| 7,684 |
AutoToken: Predicting Peak Parallelism for Big Data Analytics at Microsoft |
2020 |
VLDB |
4.6796855e-05 |
| 3,625 |
Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings |
2020 |
SIGMOD |
6.9055212e-05 |
| 6,040 |
Steering Query Optimizers: A Practical Take on Big Data Workloads |
2021 |
SIGMOD |
5.2412035e-05 |