Adaptive and Robust Query Execution for Lakehouses at Scale
Summary: AQE for lakehouses: use pipeline breakers to collect runtime statistics and reoptimize plans, mitigating missing/incorrect table/column stats and bad cardinality/UDF estimates. Up to 25× TPC‑DS speedup; deployed at Databricks for exabyte‑scale workloads to reduce data movement, spills, and memory pressure. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Maryann Xue
- 2. Yingyi Bu
- 3. Abhishek Somani
- 4. Wenchen Fan
- 5. Ziqi Liu
- 6. Steven Chen
- 7. Herman van Hovell
- 8. Bart Samwel
- 9. Mostafa Mokhtar
- 10. RK Korlapati
- 11. Andy Lam
- 12. Yunxiao Ma
- 13. Vuk Ercegovac
- 14. Jiexing Li
- 15. Alexander Behm
- 16. Yuanjian Li
- 17. Xiao Li
- 18. Sriram Krishnamurthy
- 19. Amit Shukla
- 20. Michalis Petropoulos
- 21. Sameer Paranjpye
- 22. Reynold Xin
- 23. Matei Zaharia
Incoming Citations (Sorted by Pagerank)
Showing 9 of 9 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 6,895 | Decentralized Actor Scheduling and Reference-based Storage in Xorbits: a Native Scalable Data Science Engine | 2025 | VLDB | 4.8925595e-05 |
| 9,093 | Databricks Lakeguard: Supporting Fine-grained Access Control and Multi-user Capabilities for Apache Spark Workloads | 2025 | SIGMOD | 4.398149e-05 |
| 9,747 | Still Asking: How Good Are Query Optimizers, Really? | 2025 | VLDB | 4.2897489e-05 |
| 9,763 | The UDFBench Benchmark for General-purpose UDF Queries | 2025 | VLDB | 4.2856106e-05 |
| 10,241 | Robust Predicate Transfer with Dynamic Execution | 2026 | VLDB | 4.1945683e-05 |
| 10,767 | The HANA Native Query Engine for Lakehouse Systems | 2025 | VLDB | 4.1945683e-05 |
| 10,772 | veDB-HTAP: a Highly Integrated, Efficient and Adaptive HTAP System | 2025 | VLDB | 4.1945683e-05 |
| 10,859 | Graph Transformers for Query Plan Representation: Potentials and Challenges | 2025 | VLDB | 4.1945683e-05 |
| 13,096 | Blink Twice - Automatic Workload Pinning and Regression Detection for Versionless Apache Spark using Retries | 2025 | SIGMOD | - |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 24 of 24 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,318 | Analyzing and Comparing Lakehouse Storage Systems | 2023 | CIDR | 5.5715872e-05 |
| 13,359 | Robust Data Transformations | 2015 | CIDR | - |
| 5,297 | Continuous Cloud-Scale Query Optimization and Processing | 2013 | VLDB | 5.5801669e-05 |
| 8,582 | Towards Query Optimizer as a Service (QOaaS) in a Unified LakeHouse Ecosystem: Can One QO Rule Them All? | 2025 | CIDR | 4.492033e-05 |
| 2,473 | Photon: A Fast Query Engine for Lakehouse Systems | 2022 | SIGMOD | 8.7237281e-05 |
| 6,402 | BigLake: BigQuery’s Evolution toward a Multi-Cloud Lakehouse | 2024 | SIGMOD | 5.079818e-05 |
| 8,617 | A Spark Optimizer for Adaptive, Fine-Grained Parameter Tuning | 2024 | VLDB | 4.4846425e-05 |
| 1,429 | A Scalable, Predictable Join Operator for Highly Concurrent Data Warehouses | 2009 | VLDB | 0.00012033518 |
| 10,248 | Active Data Lakes: Regaining Physical Data Independence Without Losing Interoperability | 2026 | VLDB | 4.1945683e-05 |
| 7,907 | Petabyte-Scale Row-Level Operations in Data Lakehouses | 2024 | VLDB | 4.6205839e-05 |