PTO: A Workload-driven Predictive Table Optimizer for Lakehouse Systems
Summary: Workload-driven table layout tuning for lakehouses: PTO predicts partitioning/sort/file-size/bin-packing params from query predicates, using heuristic candidate pruning + table sampling + Gradient Boosting to avoid exhaustive search. Implemented for Presto/Iceberg; significant latency gains on TPC-H/TPC-DS. (summarized by gpt-5-mini on Apr 11 2026)
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Authors
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 12 of 12 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 7,098 | Optimizing Queries over Partitioned Tables in MPP Systems | 2014 | SIGMOD | 4.833012e-05 |
| 8,347 | QPPT: Query Processing on Prefix Trees | 2013 | CIDR | 4.5410746e-05 |
| 2,575 | A Latency and Fault-Tolerance Optimizer for Online Parallel Query Plans | 2011 | SIGMOD | 8.5133576e-05 |
| 9,689 | LST-Bench: Benchmarking Log-Structured Tables in the Cloud | 2024 | SIGMOD | 4.3043822e-05 |
| 7,059 | Adaptive and Robust Query Execution for Lakehouses at Scale | 2024 | VLDB | 4.8477825e-05 |
| 7,907 | Petabyte-Scale Row-Level Operations in Data Lakehouses | 2024 | VLDB | 4.6205839e-05 |
| 8,617 | A Spark Optimizer for Adaptive, Fine-Grained Parameter Tuning | 2024 | VLDB | 4.4846425e-05 |
| 9,232 | AutoComp: Automated Data Compaction for Log-Structured Tables in Data Lakes | 2025 | SIGMOD | 4.3690661e-05 |
| 3,779 | Instance-Optimized Data Layouts for Cloud Analytics Workloads | 2021 | SIGMOD | 6.7747205e-05 |
| 11,084 | Presto’s History-based Query Optimizer | 2024 | VLDB | 4.1945683e-05 |