Dynamic Speculative Optimizations for SQL Compilation in Apache Spark
Summary: Dynamic speculative optimizations for Spark SQL compilation via runtime profiling and adaptive codegen to reduce data access and deserialization overhead on textual formats. Achieves up to 4.4x speedups on TPC-H with CSV/JSON, illustrating a unique runtime-driven codegen approach for Spark. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Filippo Schiavio
- 2. Daniele Bonetta
- 3. Walter Binder
Incoming Citations (Sorted by Pagerank)
Showing 3 of 3 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,530 | Permutable Compiled Queries: Dynamically Adapting Compiled Queries without Recompiling | 2021 | VLDB | 5.4554282e-05 |
| 8,626 | Adaptive Code Generation for Data-Intensive Analytics | 2021 | VLDB | 4.4829152e-05 |
| 9,268 | Language-Agnostic Integrated Queries in a Managed Polyglot Runtime | 2021 | VLDB | 4.3657168e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 11 of 11 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 6,871 | Towards General and Efficient Online Tuning for Spark | 2023 | VLDB | 4.8997004e-05 |
| 1,548 | Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark | 2018 | SIGMOD | 0.00011431383 |
| 3,535 | Scaling Spark in the Real World: Performance and Usability | 2015 | VLDB | 6.9992495e-05 |
| 10,868 | LEAP: A Low-cost Spark SQL Query Optimizer using Pairwise Comparison | 2025 | VLDB | 4.1945683e-05 |
| 8,197 | SparkCruise: Workload Optimization in Managed Spark Clusters at Microsoft | 2021 | VLDB | 4.5607121e-05 |
| 66 | Spark SQL: Relational Data Processing in Spark | 2015 | SIGMOD | 0.00061639801 |
| 8,506 | New Query Optimization Techniques in the Spark Engine of Azure Synapse | 2022 | VLDB | 4.4957661e-05 |
| 3,437 | Speculative Distributed CSV Data Parsing for Big Data Analytics | 2019 | SIGMOD | 7.0942161e-05 |
| 3,200 | Big Data Analytics with Datalog Queries on Spark | 2016 | SIGMOD | 7.3912411e-05 |
| 8,617 | A Spark Optimizer for Adaptive, Fine-Grained Parameter Tuning | 2024 | VLDB | 4.4846425e-05 |