Back to papers
Juggler: Autonomous Cost Optimization and Performance Prediction of Big Data Applications
Summary: Autonomously selects datasets to cache and recommends cluster configurations for in-memory iterative big-data workloads. 90% prediction accuracy; optimal/near-optimal configs in ~50% of cases; runtime to 25% and cost to 58% of baseline.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 6344
- Venue
- SIGMOD
- Year
- 2022
- Pagerank
- 4.1945683e-05
- Overall Rank
- 11,341 | 21.11%
- DOI
-
10.1145/3514221.3517892
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Incoming Citations (Sorted by Pagerank)
Showing 2 of 2 citing papers.
Outgoing Citations (Sorted by Pagerank)
Showing 14 of 14 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 66 |
Spark SQL: Relational Data Processing in Spark |
2015 |
SIGMOD |
0.00061639801 |
| 204 |
Learned Cardinalities: Estimating Correlated Joins with Deep Learning |
2019 |
CIDR |
0.00034784455 |
| 557 |
SystemML: Declarative Machine Learning on Spark |
2016 |
VLDB |
0.00020197988 |
| 758 |
Deep Unsupervised Cardinality Estimation |
2020 |
VLDB |
0.0001706608 |
| 953 |
Runtime Measurements in the Cloud: Observing, Analyzing, and Reducing Variance |
2010 |
VLDB |
0.00015095431 |
| 1,071 |
Starfish: A Self-tuning System for Big Data Analytics |
2011 |
CIDR |
0.00014312777 |
| 1,105 |
Cardinality Estimation Done Right: Index-Based Join Sampling |
2017 |
CIDR |
0.00013990395 |
| 1,902 |
Black or White? How to Develop an AutoTuner for Memory-based Analytics |
2020 |
SIGMOD |
0.00010157713 |
| 1,922 |
Selecting Subexpressions to Materialize at Datacenter Scale |
2018 |
VLDB |
0.00010082599 |
| 2,645 |
WATCHMAN: A Data Warehouse Intelligent Cache Manager |
1996 |
VLDB |
8.3829312e-05 |
| 3,013 |
Cardinality Estimation Using Sample Views with Quality Assurance |
2007 |
SIGMOD |
7.7137441e-05 |
| 3,200 |
Big Data Analytics with Datalog Queries on Spark |
2016 |
SIGMOD |
7.3912411e-05 |
| 5,688 |
PREDIcT: Towards Predicting the Runtime of Large Scale Iterative Analytics |
2013 |
VLDB |
5.3702808e-05 |
| 8,464 |
Piranha: Optimizing Short Jobs in Hadoop |
2013 |
VLDB |
4.5052127e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 9,504 |
Supporting Scalable Analytics with Latency Constraints |
2015 |
VLDB |
4.3341665e-05 |
| 4,687 |
Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures |
2023 |
VLDB |
5.9986055e-05 |
| 7,402 |
A General-Purpose Query-Centric Framework for Querying Big Graphs |
2016 |
VLDB |
4.7392415e-05 |
| 11,155 |
Predicting Query Execution time for JIT Compiled Database Engines |
2023 |
CIDR |
4.1945683e-05 |
| 5,301 |
ReCache: Reactive Caching for Fast Analytics over Heterogeneous Data |
2018 |
VLDB |
5.5790928e-05 |
| 3,625 |
Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings |
2020 |
SIGMOD |
6.9055212e-05 |
| 6,053 |
Optimizing Machine Learning Workloads in Collaborative Environments |
2020 |
SIGMOD |
5.2326838e-05 |
| 6,104 |
Automating Distributed Tiered Storage Management in Cluster Computing |
2020 |
VLDB |
5.2080102e-05 |
| 4,802 |
Resource Elasticity for Large-Scale Machine Learning |
2015 |
SIGMOD |
5.9114415e-05 |
| 11,056 |
Agile-Ant: Self-managing Distributed Cache Management for Cost Optimization of Big Data Applications |
2024 |
VLDB |
4.1945683e-05 |