Back to papers
LEAP: A Low-cost Spark SQL Query Optimizer using Pairwise Comparison
Summary: LEAP: a learned optimizer tailored for Spark SQL that integrates natively and ranks candidate plans via estimation-free pairwise comparisons (no cost model). Combines progressive, pruned plan enumeration to cheaply find better plans, cutting end-to-end time vs Spark by up to 54% and vs other learned methods by up to 94%.
(summarized by gpt-5-mini on Feb 09 2026)
- Paper ID
- 14228
- Venue
- VLDB
- Year
- 2025
- Pagerank
- 4.1945683e-05
- Overall Rank
- 10,868 | 24.40%
- DOI
-
10.14778/3712221.3712234
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
Outgoing Citations (Sorted by Pagerank)
Showing 27 of 27 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 66 |
Spark SQL: Relational Data Processing in Spark |
2015 |
SIGMOD |
0.00061639801 |
| 71 |
How Good Are Query Optimizers, Really? |
2016 |
VLDB |
0.00059038975 |
| 141 |
Selectivity Estimation Without the Attribute Value Independence Assumption |
1997 |
VLDB |
0.00041786333 |
| 204 |
Learned Cardinalities: Estimating Correlated Joins with Deep Learning |
2019 |
CIDR |
0.00034784455 |
| 333 |
Neo: A Learned Query Optimizer |
2019 |
VLDB |
0.00027206884 |
| 640 |
Bao: Making Learned Query Optimization Practical |
2021 |
SIGMOD |
0.00018759152 |
| 806 |
An End-to-End Learning-based Cost Estimator |
2020 |
VLDB |
0.00016434274 |
| 910 |
NeuroCard: One Cardinality Estimator for All Tables |
2021 |
VLDB |
0.00015423056 |
| 1,758 |
Sampling-Based Query Re-Optimization |
2016 |
SIGMOD |
0.00010655546 |
| 2,121 |
Balsa: Learning a Query Optimizer Without Expert Demonstrations |
2022 |
SIGMOD |
9.5017232e-05 |
| 2,254 |
Two-Level Sampling for Join Size Estimation |
2017 |
SIGMOD |
9.1897043e-05 |
| 2,762 |
FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation |
2021 |
VLDB |
8.1585394e-05 |
| 2,783 |
Flow-Loss: Learning Cardinality Estimates That Matter |
2021 |
VLDB |
8.1293383e-05 |
| 2,869 |
The Complexity of Transformation-Based Join Enumeration |
1997 |
VLDB |
7.9808408e-05 |
| 3,169 |
QueryFormer: A Tree Transformer Model for Query Plan Representation |
2022 |
VLDB |
7.4498425e-05 |
| 3,348 |
Lero: A Learning-to-Rank Query Optimizer |
2023 |
VLDB |
7.1904529e-05 |
| 3,449 |
Learned Cardinality Estimation: A Design Space Exploration and A Comparative Evaluation |
2022 |
VLDB |
7.0824319e-05 |
| 3,727 |
Cost-based or Learning-based? A Hybrid Query Optimizer for Query Plan Selection |
2022 |
VLDB |
6.8141709e-05 |
| 3,990 |
FactorJoin: A New Cardinality Estimation Framework for Join Queries |
2023 |
SIGMOD |
6.5581983e-05 |
| 4,417 |
Robust Query Driven Cardinality Estimation under Changing Workloads |
2023 |
VLDB |
6.2037371e-05 |
| 4,462 |
LOGER: A Learned Optimizer towards Generating Efficient and Robust Query Execution Plans |
2023 |
VLDB |
6.1611784e-05 |
| 4,543 |
FACE: A Normalizing Flow based Cardinality Estimator |
2022 |
VLDB |
6.1011198e-05 |
| 5,334 |
LEON: A New Framework for ML-Aided Query Optimization |
2023 |
VLDB |
5.5649836e-05 |
| 5,401 |
ALECE: An Attention-based Learned Cardinality Estimator for SPJ Queries on Dynamic Workloads |
2024 |
VLDB |
5.5285035e-05 |
| 5,833 |
LOCAT: Low-Overhead Online Configuration Auto-Tuning of Spark SQL Applications |
2022 |
SIGMOD |
5.3106182e-05 |
| 6,328 |
A Comparative Study and Component Analysis of Query Plan Representation Techniques in ML4DB Studies |
2024 |
VLDB |
5.1082882e-05 |
| 7,221 |
Speeding Up End-to-end Query Execution via Learning-based Progressive Cardinality Estimation |
2023 |
SIGMOD |
4.797194e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 3,348 |
Lero: A Learning-to-Rank Query Optimizer |
2023 |
VLDB |
7.1904529e-05 |
| 6,040 |
Steering Query Optimizers: A Practical Take on Big Data Workloads |
2021 |
SIGMOD |
5.2412035e-05 |
| 5,718 |
Conjunctive Queries with Comparisons |
2022 |
SIGMOD |
5.3552123e-05 |
| 6,685 |
How Good are Learned Cost Models, Really? Insights from Query Optimization Tasks |
2025 |
SIGMOD |
4.9627485e-05 |
| 6,667 |
Leveraging Query Logs and Machine Learning for Parametric Query Optimization |
2022 |
VLDB |
4.9688874e-05 |
| 5,952 |
Eraser: Eliminating Performance Regression on Learned Query Optimizer |
2024 |
VLDB |
5.2591691e-05 |
| 10,219 |
Practical Parameterized Query Optimization via Efficient Plan Reuse and List-wise Ranking |
2026 |
SIGMOD |
4.1945683e-05 |
| 9,124 |
Dynamic Speculative Optimizations for SQL Compilation in Apache Spark |
2020 |
VLDB |
4.391961e-05 |
| 3,727 |
Cost-based or Learning-based? A Hybrid Query Optimizer for Query Plan Selection |
2022 |
VLDB |
6.8141709e-05 |
| 8,617 |
A Spark Optimizer for Adaptive, Fine-Grained Parameter Tuning |
2024 |
VLDB |
4.4846425e-05 |