Efficient Deep Learning Pipelines for Accurate Cost Estimations Over Large Scale Query Workload
Summary: Prestroid uses tree-convolution to predict SQL query resource usage from traces, reducing encoding/padding waste in large-scale DL training. On 19k Presto queries over 20PB, it outperforms baselines and cuts memory 13.5x and epoch time 3.45x, with up to 13.2x Azure savings. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Johan Kok Zhi Kang
- 2. Gaurav
- 3. Sien Yi Tan
- 4. Feng Cheng
- 5. Shixuan Sun
- 6. Bingsheng He
Incoming Citations (Sorted by Pagerank)
Showing 16 of 16 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 10 of 10 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 66 | Spark SQL: Relational Data Processing in Spark | 2015 | SIGMOD | 0.00061639801 |
| 204 | Learned Cardinalities: Estimating Correlated Joins with Deep Learning | 2019 | CIDR | 0.00034784455 |
| 333 | Neo: A Learned Query Optimizer | 2019 | VLDB | 0.00027206884 |
| 806 | An End-to-End Learning-based Cost Estimator | 2020 | VLDB | 0.00016434274 |
| 1,737 | QuickSel: Quick Selectivity Learning with Mixture Models | 2020 | SIGMOD | 0.00010720294 |
| 2,084 | The Case for Predictive Database Systems: Opportunities and Challenges | 2011 | CIDR | 9.5820534e-05 |
| 3,658 | Towards a Hands-Free Query Optimizer through Deep Learning | 2019 | CIDR | 6.8704209e-05 |
| 5,473 | Facilitating SQL Query Composition and Analysis | 2020 | SIGMOD | 5.4885366e-05 |
| 5,505 | A Top-Down Approach to Achieving Performance Predictability in Database Systems | 2017 | SIGMOD | 5.4734224e-05 |
| 7,684 | AutoToken: Predicting Peak Parallelism for Big Data Analytics at Microsoft | 2020 | VLDB | 4.6796855e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 2,501 | DBEst: Revisiting Approximate Query Processing Engines with Machine Learning Models | 2019 | SIGMOD | 8.6453446e-05 |
| 5,637 | Database Workload Characterization with Query Plan Encoders | 2022 | VLDB | 5.3979505e-05 |
| 9,351 | On Efficient Approximate Queries over Machine Learning Models | 2023 | VLDB | 4.3524472e-05 |
| 329 | Accelerating Machine Learning Inference with Probabilistic Predicates | 2018 | SIGMOD | 0.00027249545 |
| 11,650 | Query-Driven Learning for Next Generation Predictive Modeling & Analytics | 2019 | SIGMOD | 4.1945683e-05 |
| 806 | An End-to-End Learning-based Cost Estimator | 2020 | VLDB | 0.00016434274 |
| 884 | Plan-Structured Deep Neural Network Models for Query Performance Prediction | 2019 | VLDB | 0.00015654004 |
| 9,120 | Deep Query Optimization | 2019 | SIGMOD | 4.392741e-05 |
| 3,625 | Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings | 2020 | SIGMOD | 6.9055212e-05 |
| 3,828 | Zero-Shot Cost Models for Out-of-the-box Learned Cost Prediction | 2022 | VLDB | 6.7208524e-05 |