headshot of Ryan Ryan Marcus, assistant professor at the University of Pennsylvania (Fall '23). Using machine learning to build the next generation of data systems.
      
    ____                       __  ___                          
   / __ \__  ______ _____     /  |/  /___ _____________  _______
  / /_/ / / / / __ `/ __ \   / /|_/ / __ `/ ___/ ___/ / / / ___/
 / _, _/ /_/ / /_/ / / / /  / /  / / /_/ / /  / /__/ /_/ (__  ) 
/_/ |_|\__, /\__,_/_/ /_/  /_/  /_/\__,_/_/   \___/\__,_/____/  
      /____/                                                    
        
   ___                   __  ___                    
  / _ \__ _____ ____    /  |/  /__ ___________ _____
 / , _/ // / _ `/ _ \  / /|_/ / _ `/ __/ __/ // (_-<
/_/|_|\_, /\_,_/_//_/ /_/  /_/\_,_/_/  \__/\_,_/___/
     /___/                                          
        
   ___  __  ___                    
  / _ \/  |/  /__ ___________ _____
 / , _/ /|_/ / _ `/ __/ __/ // (_-<
/_/|_/_/  /_/\_,_/_/  \__/\_,_/___/                                   
        

Datasets

This page contains a list of datasets I’ve made available. If you use them, please cite them! Feel free to let me know if you use the dataset as well, and I’ll link to your paper/project here. All of these datasets are available through this Dataverse.

The BibTeX entries given here are from the Dataverse export, with capitalization fixed. Some style files don’t support the @data type, so you might have to modify them a bit.

Executed JOB Queries

This dataset contains executions of each query from the Join Order Benchmark on PostgreSQL 10.5. The data is loaded, appropriate indexes are created, and each query is ran once with a cold cache.

The format of each file is the output of pre-pending EXPLAIN (FORMAT JSON, ANALYZE) to each SQL query. Queries are executed in a VirtualBox VM created by Vagrant. The VM has two cores and 8GB of memory. The PostgreSQL buffer size is set to 4GB. The VM is ran on a machine with 16GB of RAM and a Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz.

@data{RMarcusJOB,
  author = {Ryan Marcus},
  publisher = {Harvard Dataverse},
  title = {{JOB} Queries Executed by {PostgreSQL} 10.5},
  year = {2018},
  doi = {10.7910/DVN/QIIAPS},
  url = {https://doi.org/10.7910/DVN/QIIAPS}
}