Back to papers
Steering Query Optimizers: A Practical Take on Big Data Workloads
Summary: Steering query optimizers for big data; Bao adapted to SCOPE. Introduces rule signatures, a pipeline for recurring configs, and a learning method for unseen workloads; evaluated on 150K daily jobs with 7–30% latency savings, up to 90% on subset.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 6257
- Venue
- SIGMOD
- Year
- 2021
- Pagerank
- 5.2412035e-05
- Overall Rank
- 6,040 | 57.99%
- DOI
-
10.1145/3448016.3457568
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 21 of 21 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 3,248 |
A Learned Query Rewrite System using Monte Carlo Tree Search |
2022 |
VLDB |
7.3258782e-05 |
| 3,348 |
Lero: A Learning-to-Rank Query Optimizer |
2023 |
VLDB |
7.1904529e-05 |
| 3,990 |
FactorJoin: A New Cardinality Estimation Framework for Join Queries |
2023 |
SIGMOD |
6.5581983e-05 |
| 4,593 |
Auto-WLM: Machine Learning Enhanced Workload Management in Amazon Redshift |
2023 |
SIGMOD |
6.0606891e-05 |
| 4,690 |
Deploying a Steered Query Optimizer in Production at Microsoft |
2022 |
SIGMOD |
5.997226e-05 |
| 5,334 |
LEON: A New Framework for ML-Aided Query Optimization |
2023 |
VLDB |
5.5649836e-05 |
| 5,640 |
AutoSteer: Learned Query Optimization for Any SQL Database |
2023 |
VLDB |
5.3933314e-05 |
| 6,297 |
Towards instance-optimized data systems |
2021 |
VLDB |
5.1227886e-05 |
| 6,885 |
PilotScope: Steering Databases with Machine Learning Drivers |
2024 |
VLDB |
4.895386e-05 |
| 7,655 |
Machine Learning for Cloud Data Systems: the Progress so far and the Path Forward |
2021 |
VLDB |
4.6872456e-05 |
| 8,164 |
Efficiently Computing Join Orders with Heuristic Search |
2023 |
SIGMOD |
4.5718104e-05 |
| 8,197 |
SparkCruise: Workload Optimization in Managed Spark Clusters at Microsoft |
2021 |
VLDB |
4.5607121e-05 |
| 8,220 |
PerfGuard: Deploying ML-for-Systems without Performance Regressions, Almost! |
2021 |
VLDB |
4.5557328e-05 |
| 8,416 |
Towards Building Autonomous Data Services on Azure |
2023 |
SIGMOD |
4.5196199e-05 |
| 8,582 |
Towards Query Optimizer as a Service (QOaaS) in a Unified LakeHouse Ecosystem: Can One QO Rule Them All? |
2025 |
CIDR |
4.492033e-05 |
| 8,659 |
Learned Offline Query Planning via Bayesian Optimization |
2025 |
SIGMOD |
4.4722928e-05 |
| 8,783 |
GEqO: ML-Accelerated Semantic Equivalence Detection |
2023 |
SIGMOD |
4.452825e-05 |
| 9,006 |
Hit the Gym: Accelerating Query Execution to Efficiently Bootstrap Behavior Models for Self-Driving Database Management Systems |
2024 |
VLDB |
4.4101482e-05 |
| 9,587 |
Low Rank Learning for Offline Query Optimization |
2025 |
SIGMOD |
4.3215645e-05 |
| 9,710 |
QO-Insight: Inspecting Steered Query Optimizers |
2023 |
VLDB |
4.299267e-05 |
| 10,491 |
Intra-Query Runtime Elasticity for Cloud-Native Data Analysis |
2025 |
SIGMOD |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 17 of 17 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 22 |
SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets |
2008 |
VLDB |
0.0008456613 |
| 71 |
How Good Are Query Optimizers, Really? |
2016 |
VLDB |
0.00059038975 |
| 167 |
The Snowflake Elastic Data Warehouse |
2016 |
SIGMOD |
0.00039180521 |
| 204 |
Learned Cardinalities: Estimating Correlated Joins with Deep Learning |
2019 |
CIDR |
0.00034784455 |
| 333 |
Neo: A Learned Query Optimizer |
2019 |
VLDB |
0.00027206884 |
| 544 |
Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources |
2018 |
SIGMOD |
0.00020521965 |
| 758 |
Deep Unsupervised Cardinality Estimation |
2020 |
VLDB |
0.0001706608 |
| 906 |
F1: A Distributed SQL Database That Scales |
2013 |
VLDB |
0.00015448884 |
| 910 |
NeuroCard: One Cardinality Estimator for All Tables |
2021 |
VLDB |
0.00015423056 |
| 1,254 |
Selectivity Estimation for Range Predicates using Lightweight Models |
2019 |
VLDB |
0.00013027411 |
| 1,300 |
The Picasso Database Query Optimizer Visualizer |
2010 |
VLDB |
0.00012733214 |
| 2,083 |
Towards a Learning Optimizer for Shared Clouds |
2019 |
VLDB |
9.5834572e-05 |
| 3,625 |
Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings |
2020 |
SIGMOD |
6.9055212e-05 |
| 3,954 |
Efficiently Approximating Selectivity Functions using Low Overhead Regression Models |
2020 |
VLDB |
6.5926838e-05 |
| 4,174 |
Computation Reuse in Analytics Job Service at Microsoft |
2018 |
SIGMOD |
6.3856219e-05 |
| 6,763 |
Robustness Metrics for Relational Query Execution Plans |
2018 |
VLDB |
4.9338479e-05 |
| 7,684 |
AutoToken: Predicting Peak Parallelism for Big Data Analytics at Microsoft |
2020 |
VLDB |
4.6796855e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 9,120 |
Deep Query Optimization |
2019 |
SIGMOD |
4.392741e-05 |
| 5,014 |
Dynamically Optimizing Queries over Large Scale Data Platforms |
2014 |
SIGMOD |
5.7586174e-05 |
| 7,828 |
Modeling Shifting Workloads for Learned Database Systems |
2024 |
SIGMOD |
4.6407986e-05 |
| 3,658 |
Towards a Hands-Free Query Optimizer through Deep Learning |
2019 |
CIDR |
6.8704209e-05 |
| 6,667 |
Leveraging Query Logs and Machine Learning for Parametric Query Optimization |
2022 |
VLDB |
4.9688874e-05 |
| 5,640 |
AutoSteer: Learned Query Optimization for Any SQL Database |
2023 |
VLDB |
5.3933314e-05 |
| 5,297 |
Continuous Cloud-Scale Query Optimization and Processing |
2013 |
VLDB |
5.5801669e-05 |
| 3,625 |
Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings |
2020 |
SIGMOD |
6.9055212e-05 |
| 6,673 |
Incorporating Super-Operators in Big-Data Query Optimizers |
2020 |
VLDB |
4.966799e-05 |
| 4,690 |
Deploying a Steered Query Optimizer in Production at Microsoft |
2022 |
SIGMOD |
5.997226e-05 |