Ryan Marcus, assistant professor at the University of Pennsylvania. Using machine learning to build the next generation of data systems.
____ __ ___
/ __ \__ ______ _____ / |/ /___ _____________ _______
/ /_/ / / / / __ `/ __ \ / /|_/ / __ `/ ___/ ___/ / / / ___/
/ _, _/ /_/ / /_/ / / / / / / / / /_/ / / / /__/ /_/ (__ )
/_/ |_|\__, /\__,_/_/ /_/ /_/ /_/\__,_/_/ \___/\__,_/____/
/____/
___ __ ___
/ _ \__ _____ ____ / |/ /__ ___________ _____
/ , _/ // / _ `/ _ \ / /|_/ / _ `/ __/ __/ // (_-<
/_/|_|\_, /\_,_/_//_/ /_/ /_/\_,_/_/ \__/\_,_/___/
/___/
___ __ ___
/ _ \/ |/ /__ ___________ _____
/ , _/ /|_/ / _ `/ __/ __/ // (_-<
/_/|_/_/ /_/\_,_/_/ \__/\_,_/___/
Datasets
This page contains a list of datasets I’ve made available. If you use them, please cite them! Feel free to let me know if you use the dataset as well, and I’ll link to your paper/project here. All of these datasets are available through this Dataverse.
The BibTeX entries given here are from the Dataverse export, with capitalization fixed. Some style files don’t support the @data
type, so you might have to modify them a bit.
Executed JOB Queries
This dataset contains executions of each query from the Join Order Benchmark on PostgreSQL 10.5. The data is loaded, appropriate indexes are created, and each query is ran once with a cold cache.
The format of each file is the output of pre-pending EXPLAIN (FORMAT JSON, ANALYZE)
to each SQL query. Queries are executed in a VirtualBox VM created by Vagrant. The VM has two cores and 8GB of memory. The PostgreSQL buffer size is set to 4GB. The VM is ran on a machine with 16GB of RAM and a Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz.
@data{RMarcusJOB,
author = {Ryan Marcus},
publisher = {Harvard Dataverse},
title = {{JOB} Queries Executed by {PostgreSQL} 10.5},
year = {2018},
doi = {10.7910/DVN/QIIAPS},
url = {https://doi.org/10.7910/DVN/QIIAPS}
}