Analyzing the executions

With logs of our saved runs in bao_run.txt and pg_run.txt, we can next analyze them. If you are using the VM, you can move these files to /vagrant/ to access them from outside the VM.

To analyze the logs, run the analyze_bao.ipynb Python notebook. Make sure bao_run.txt and pg_run.txt are in the same folder. You'll need Jupyter to run the notebook if you don't already have it.

The notebook will first generate a queries complete vs. time graph. Depending on your hardware, the results will vary. Below is what I saw on my hardware:

Queries completed over time

The graph shows the number of queries that have been completed since the start of the experiment, including the training time of Bao. Each red circle indicates a time when query execution was paused and the Bao model was trained. Obviously, when deployed, one does not have to pause query execution to retrain the Bao model, and this retraining can either be offloaded to another machine or done at a fixed time during the day, concurrent with query processing. On my hardware, Bao executed the example workload about twice as fast as the default PostgreSQL optimizer.

Where do Bao's gains come from? Finishing the workload faster could come from making every query a little faster, or making a few queries much faster. To examine this, we look at the CDF of query times, which is the next pair of graphs generated by the notebook.

CDF of query completion time

The left figure shows the CDF with a linear y-axis, whereas the right figure shows the same CDF with a log scale. We can see that the majority of Bao's gains come from decreasing latency "at the tail" of the distribution: the 90%, 95%, etc. Bao incurs a slightly longer longest-running query (best visible on the left plot), and increases the query time of the fastest queries slightly (best visible on the right plot).

Bao increases the latency of the fastest query by a small margin because of increased optimization time. This is because query planning with Bao requires executing the PostgreSQL planner several times, and running inference through a neural network. If you care about a particular query taking 0.01 seconds instead of 0.05 seconds, you can always disable Bao on a per-query basis.

What about query regressions? The next table shows the latency of each query when using the PostgreSQL optimizer, and the worst / best time achieved by Bao.

PG Bao worst Bao best
Q
q1 275.415884 12.206382 6.005776
q2 71.049927 198.310226 9.242487
q3 10.982070 290.048801 10.805816
q4 26.890862 26.966064 1.527303
q5 9.692364 9.354480 1.350012
q6 21.741243 19.851484 7.341236
q7 51.935738 51.321676 7.288905
q8 28.725613 15.981973 5.995388
q9 15.645138 16.394102 7.327004
q10 11.720967 9.688347 7.373339
q11 15.163100 7.686548 5.853226
q12 12.934380 9.379889 4.565600
q13 18.687008 11.803825 3.417922
q14 11.100027 14.864732 7.060695
q15 9.641760 8.258874 4.153027
q16 5.312640 7.992982 1.221813
q17 6.404161 17.702658 5.868285
q18 11.912653 20.336241 6.772051
q19 9.943220 33.939818 10.330661
q20 0.143906 0.679753 0.344254
q21 1.022706 1.292618 0.921263
q22 16.113360 51.231996 8.196555
q23 12.050350 13.501636 7.194857
q24 0.025990 0.153196 0.100763
q25 3.906976 5.511800 2.178743
q26 10.439918 16.880665 7.772304
q27 0.759958 1.491062 0.461650
q28 1.784515 2.679448 1.671500
q29 0.279165 0.263327 0.113964
q30 6.967469 7.600260 5.197018
q31 1.877799 3.540210 1.459715
q32 0.981562 7.478595 0.731711
q33 2.215288 4.805330 1.749660
q34 5.175736 8.535833 2.692968
q35 6.402113 14.327323 5.302665
q36 11.992452 15.665057 8.864666
q37 12.208148 22.470210 9.716367
q38 13.334725 34.440121 10.314824
q39 8.051381 16.096334 8.621814
q40 14.709921 19.853354 11.598819

We can see immediately that Bao has huge gains on Q1, which takes almost 5 minutes for PostgreSQL to process. However, on Q2 and Q3, while Bao normally found a pretty good plan (the Bao best column), sometimes Bao picked a regressing plan that took significantly longer to execute than the PostgreSQL plan.

For some applications, the raw workload speedup achieved by Bao may be more important than any of these query regressions. For other applications, these query regressions may be a huge problem. Next, we discuss how to use Bao's exploration mode to avoid these regressions.