Analyzing the executions
With logs of our saved runs in bao_run.txt
and pg_run.txt
, we can next analyze them. If you are using the VM, you can move these files to /vagrant/
to access them from outside the VM.
To analyze the logs, run the analyze_bao.ipynb
Python notebook. Make sure bao_run.txt
and pg_run.txt
are in the same folder. You'll need Jupyter to run the notebook if you don't already have it.
The notebook will first generate a queries complete vs. time graph. Depending on your hardware, the results will vary. Below is what I saw on my hardware:
The graph shows the number of queries that have been completed since the start of the experiment, including the training time of Bao. Each red circle indicates a time when query execution was paused and the Bao model was trained. Obviously, when deployed, one does not have to pause query execution to retrain the Bao model, and this retraining can either be offloaded to another machine or done at a fixed time during the day, concurrent with query processing. On my hardware, Bao executed the example workload about twice as fast as the default PostgreSQL optimizer.
Where do Bao's gains come from? Finishing the workload faster could come from making every query a little faster, or making a few queries much faster. To examine this, we look at the CDF of query times, which is the next pair of graphs generated by the notebook.
The left figure shows the CDF with a linear y-axis, whereas the right figure shows the same CDF with a log scale. We can see that the majority of Bao's gains come from decreasing latency "at the tail" of the distribution: the 90%, 95%, etc. Bao incurs a slightly longer longest-running query (best visible on the left plot), and increases the query time of the fastest queries slightly (best visible on the right plot).
Bao increases the latency of the fastest query by a small margin because of increased optimization time. This is because query planning with Bao requires executing the PostgreSQL planner several times, and running inference through a neural network. If you care about a particular query taking 0.01 seconds instead of 0.05 seconds, you can always disable Bao on a per-query basis.
What about query regressions? The next table shows the latency of each query when using the PostgreSQL optimizer, and the worst / best time achieved by Bao.
PG | Bao worst | Bao best | |
---|---|---|---|
Q | |||
q1 | 275.415884 | 12.206382 | 6.005776 |
q2 | 71.049927 | 198.310226 | 9.242487 |
q3 | 10.982070 | 290.048801 | 10.805816 |
q4 | 26.890862 | 26.966064 | 1.527303 |
q5 | 9.692364 | 9.354480 | 1.350012 |
q6 | 21.741243 | 19.851484 | 7.341236 |
q7 | 51.935738 | 51.321676 | 7.288905 |
q8 | 28.725613 | 15.981973 | 5.995388 |
q9 | 15.645138 | 16.394102 | 7.327004 |
q10 | 11.720967 | 9.688347 | 7.373339 |
q11 | 15.163100 | 7.686548 | 5.853226 |
q12 | 12.934380 | 9.379889 | 4.565600 |
q13 | 18.687008 | 11.803825 | 3.417922 |
q14 | 11.100027 | 14.864732 | 7.060695 |
q15 | 9.641760 | 8.258874 | 4.153027 |
q16 | 5.312640 | 7.992982 | 1.221813 |
q17 | 6.404161 | 17.702658 | 5.868285 |
q18 | 11.912653 | 20.336241 | 6.772051 |
q19 | 9.943220 | 33.939818 | 10.330661 |
q20 | 0.143906 | 0.679753 | 0.344254 |
q21 | 1.022706 | 1.292618 | 0.921263 |
q22 | 16.113360 | 51.231996 | 8.196555 |
q23 | 12.050350 | 13.501636 | 7.194857 |
q24 | 0.025990 | 0.153196 | 0.100763 |
q25 | 3.906976 | 5.511800 | 2.178743 |
q26 | 10.439918 | 16.880665 | 7.772304 |
q27 | 0.759958 | 1.491062 | 0.461650 |
q28 | 1.784515 | 2.679448 | 1.671500 |
q29 | 0.279165 | 0.263327 | 0.113964 |
q30 | 6.967469 | 7.600260 | 5.197018 |
q31 | 1.877799 | 3.540210 | 1.459715 |
q32 | 0.981562 | 7.478595 | 0.731711 |
q33 | 2.215288 | 4.805330 | 1.749660 |
q34 | 5.175736 | 8.535833 | 2.692968 |
q35 | 6.402113 | 14.327323 | 5.302665 |
q36 | 11.992452 | 15.665057 | 8.864666 |
q37 | 12.208148 | 22.470210 | 9.716367 |
q38 | 13.334725 | 34.440121 | 10.314824 |
q39 | 8.051381 | 16.096334 | 8.621814 |
q40 | 14.709921 | 19.853354 | 11.598819 |
We can see immediately that Bao has huge gains on Q1, which takes almost 5 minutes for PostgreSQL to process. However, on Q2 and Q3, while Bao normally found a pretty good plan (the Bao best column), sometimes Bao picked a regressing plan that took significantly longer to execute than the PostgreSQL plan.
For some applications, the raw workload speedup achieved by Bao may be more important than any of these query regressions. For other applications, these query regressions may be a huge problem. Next, we discuss how to use Bao's exploration mode to avoid these regressions.