Database Paper Browser

Back to papers

Runtime Variation in Big Data Analytics

Summary: Two-step predictor for runtime distribution: shape features plus a classifier with >96% accuracy. First large-scale study predicting enterprise analytics runtime categories; enables what-if analyses on allocation and scheduling. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
6570
Venue
SIGMOD
Year
2023
Pagerank
4.653651e-05
Overall Rank
7,778 | 45.90%
DOI
10.1145/3588921

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 2 of 2 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 19 of 19 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
22 SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets 2008 VLDB 0.0008456613
70 Hive - A Warehousing Solution Over a Map-Reduce Framework 2009 VLDB 0.00059533166
183 Automatic Database Management System Tuning Through Large-scale Machine Learning 2017 SIGMOD 0.00036721403
514 An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning 2019 SIGMOD 0.0002124895
953 Runtime Measurements in the Cloud: Observing, Analyzing, and Reducing Variance 2010 VLDB 0.00015095431
2,372 Predictable Performance for Unpredictable Workloads 2009 VLDB 8.947963e-05
2,817 Recurring Job Optimization in Scope 2012 SIGMOD 8.0677653e-05
3,038 Azure Data Lake Store: A Hyperscale Distributed File Service for Big Data Analytics 2017 SIGMOD 7.6717218e-05
4,061 Advanced Partitioning Techniques for Massively Distributed Computation 2012 SIGMOD 6.483587e-05
4,127 A Statistical Perspective on Discovering Functional Dependencies in Noisy Data 2020 SIGMOD 6.4310458e-05
4,248 Hyper Dimension Shuffle: Efficient Data Repartition at Petabyte Scale in SCOPE 2019 VLDB 6.3247927e-05
5,297 Continuous Cloud-Scale Query Optimization and Processing 2013 VLDB 5.5801669e-05
5,505 A Top-Down Approach to Achieving Performance Predictability in Database Systems 2017 SIGMOD 5.4734224e-05
6,209 AutoExecutor: Predictive Parallelism for Spark SQL Queries 2021 VLDB 5.1565972e-05
6,261 The Cosmos Big Data Platform at Microsoft: Over a Decade of Progress and a Decade to Look Forward 2021 VLDB 5.1350714e-05
6,757 KEA: Tuning an Exabyte-Scale Data Infrastructure 2021 SIGMOD 4.9372134e-05
7,067 JetScope: Reliable and Interactive Analytics at Cloud Scale 2015 VLDB 4.8440936e-05
7,684 AutoToken: Predicting Peak Parallelism for Big Data Analytics at Microsoft 2020 VLDB 4.6796855e-05
9,194 Phoebe: A Learning-based Checkpoint Optimizer 2021 VLDB 4.3761777e-05
Previous Page 1 / 1 Next

Semantically Similar Papers