Back to papers
Towards General and Efficient Online Tuning for Spark
Summary: General BO-based Spark tuner with a unified multi-objective/constraint formulation that performs online safe configuration search during real periodic job runs to eliminate offline evaluation overhead. Uses adaptive sub-space generation, approximate gradient descent, and meta-learning to accelerate search; deployed in production at Tencent, saving ~57% memory and ~35% CPU on 25K tasks within 20 iterations.
(summarized by gpt-5-mini on Feb 09 2026)
- Paper ID
- 13188
- Venue
- VLDB
- Year
- 2023
- Pagerank
- 4.8997004e-05
- Overall Rank
- 6,871 | 52.21%
- DOI
-
10.14778/3611540.3611548
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 6 of 6 citing papers.
Outgoing Citations (Sorted by Pagerank)
Showing 22 of 22 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 66 |
Spark SQL: Relational Data Processing in Spark |
2015 |
SIGMOD |
0.00061639801 |
| 183 |
Automatic Database Management System Tuning Through Large-scale Machine Learning |
2017 |
SIGMOD |
0.00036721403 |
| 288 |
Storm @Twitter |
2014 |
SIGMOD |
0.00028939871 |
| 424 |
Tuning Database Configuration Parameters with iTuned |
2009 |
VLDB |
0.00023616398 |
| 514 |
An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning |
2019 |
SIGMOD |
0.0002124895 |
| 542 |
Shark: SQL and Rich Analytics at Scale |
2013 |
SIGMOD |
0.00020595648 |
| 716 |
Query-based Workload Forecasting for Self-Driving Database Management Systems |
2018 |
SIGMOD |
0.00017723171 |
| 782 |
QTune: A Query-Aware Database Tuning System with Deep Reinforcement Learning |
2019 |
VLDB |
0.00016729063 |
| 824 |
Twitter Heron: Stream Processing at Scale |
2015 |
SIGMOD |
0.0001623129 |
| 1,071 |
Starfish: A Self-tuning System for Big Data Analytics |
2011 |
CIDR |
0.00014312777 |
| 1,902 |
Black or White? How to Develop an AutoTuner for Memory-based Analytics |
2020 |
SIGMOD |
0.00010157713 |
| 2,338 |
Samza: Stateful Scalable Stream Processing at LinkedIn |
2017 |
VLDB |
9.00711e-05 |
| 2,839 |
VolcanoML: Speeding up End-to-End AutoML via Scalable Search Space Decomposition |
2021 |
VLDB |
8.0378978e-05 |
| 3,522 |
ResTune: Resource Oriented Tuning Boosted by Meta-Learning for Cloud Databases |
2021 |
SIGMOD |
7.0096727e-05 |
| 3,812 |
Facilitating Database Tuning with Hyper-Parameter Optimization: A Comprehensive Experimental Evaluation |
2022 |
VLDB |
6.7373184e-05 |
| 3,914 |
A Demonstration of the OtterTune Automatic Database Management System Tuning Service |
2018 |
VLDB |
6.6339644e-05 |
| 4,399 |
HUNTER: An Online Cloud Database Hybrid Tuning System for Personalized Requirements |
2022 |
SIGMOD |
6.2225151e-05 |
| 4,842 |
Towards Dynamic and Safe Configuration Tuning for Cloud Databases |
2022 |
SIGMOD |
5.8826802e-05 |
| 5,833 |
LOCAT: Low-Overhead Online Configuration Auto-Tuning of Spark SQL Applications |
2022 |
SIGMOD |
5.3106182e-05 |
| 6,379 |
A Unified and Efficient Coordinating Framework for Autonomous DBMS Tuning |
2023 |
SIGMOD |
5.0909479e-05 |
| 6,757 |
KEA: Tuning an Exabyte-Scale Data Infrastructure |
2021 |
SIGMOD |
4.9372134e-05 |
| 9,375 |
Efficient Big Data Processing in Hadoop MapReduce |
2012 |
VLDB |
4.347384e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 3,200 |
Big Data Analytics with Datalog Queries on Spark |
2016 |
SIGMOD |
7.3912411e-05 |
| 9,192 |
Hyper-Tune: Towards Efficient Hyper-parameter Tuning at Scale |
2022 |
VLDB |
4.3765131e-05 |
| 8,197 |
SparkCruise: Workload Optimization in Managed Spark Clusters at Microsoft |
2021 |
VLDB |
4.5607121e-05 |
| 10,414 |
Rockhopper: A Robust Optimizer for Spark Configuration Tuning in Production Environment |
2025 |
SIGMOD |
4.1945683e-05 |
| 9,733 |
ContTune: Continuous Tuning by Conservative Bayesian Optimization for Distributed Stream Data Processing Systems |
2023 |
VLDB |
4.2942813e-05 |
| 9,155 |
Towards Resource Efficiency: Practical Insights into Large-Scale Spark Workloads at ByteDance |
2024 |
VLDB |
4.3849295e-05 |
| 5,833 |
LOCAT: Low-Overhead Online Configuration Auto-Tuning of Spark SQL Applications |
2022 |
SIGMOD |
5.3106182e-05 |
| 4,842 |
Towards Dynamic and Safe Configuration Tuning for Cloud Databases |
2022 |
SIGMOD |
5.8826802e-05 |
| 6,268 |
Speedup Your Analytics: Automatic Parameter Tuning for Databases and Big Data Systems |
2019 |
VLDB |
5.133857e-05 |
| 8,617 |
A Spark Optimizer for Adaptive, Fine-Grained Parameter Tuning |
2024 |
VLDB |
4.4846425e-05 |