Back to papers
Cerebro: A Layered Data Platform for Scalable Deep Learning
Summary: Cerebro: a layered data platform that makes DL model selection a first-class data-management problem with higher-level APIs and a DB-inspired architecture for coordinated large-scale model/hyperparameter search. Uses multi-query optimization to share computation and reduce resource waste.
(summarized by gpt-5-mini on Feb 09 2026)
- Paper ID
- 424
- Venue
- CIDR
- Year
- 2021
- Pagerank
- 4.4326439e-05
- Overall Rank
- 8,864 | 38.34%
- DOI
-
-
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 10 of 10 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 4,557 |
Distributed Deep Learning on Data Systems: A Comparative Analysis of Approaches |
2021 |
VLDB |
6.087611e-05 |
| 6,884 |
Lotan: Bridging the Gap between GNNs and Scalable Graph Analytics Engines |
2023 |
VLDB |
4.8955332e-05 |
| 9,172 |
GraphGem: Optimized Scalable System for Graph Convolutional Networks |
2021 |
SIGMOD |
4.3845844e-05 |
| 9,222 |
Towards an Optimized GROUP BY Abstraction for Large-Scale Machine Learning |
2021 |
VLDB |
4.3698672e-05 |
| 9,223 |
Intermittent Human-in-the-Loop Model Selection using Cerebro: A Demonstration |
2021 |
VLDB |
4.3698672e-05 |
| 9,596 |
Scalable Graph Convolutional Network Training on Distributed-Memory Systems |
2023 |
VLDB |
4.319218e-05 |
| 9,603 |
Saturn: An Optimized Data System for Multi-Large-Model Deep Learning Workloads |
2024 |
VLDB |
4.3177432e-05 |
| 10,976 |
StarfishDB: a Query Execution Engine for Relational Probabilistic Programming |
2024 |
SIGMOD |
4.1945683e-05 |
| 11,447 |
Grouped Learning: Group-By Model Selection Workloads |
2021 |
SIGMOD |
4.1945683e-05 |
| 13,171 |
Reimagining Deep Learning Systems Through the Lens of Data Systems |
2024 |
VLDB |
- |
Outgoing Citations (Sorted by Pagerank)
Showing 20 of 20 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 140 |
The MADlib Analytics Library or MAD Skills, the SQL |
2012 |
VLDB |
0.00042270404 |
| 411 |
PyTorch Distributed: Experiences on Accelerating Data Parallel Training |
2020 |
VLDB |
0.00023906921 |
| 658 |
Towards a Unified Architecture for in-RDBMS Analytics |
2012 |
SIGMOD |
0.00018506577 |
| 683 |
Cerebro: A Data System for Optimized Deep Learning Model Selection |
2020 |
VLDB |
0.00018195476 |
| 761 |
Materialization Optimizations for Feature Selection Workloads |
2014 |
SIGMOD |
0.00017053783 |
| 1,167 |
Learning Generalized Linear Models Over Normalized Data |
2015 |
SIGMOD |
0.00013547713 |
| 1,279 |
Towards Linear Algebra over Normalized Data |
2017 |
VLDB |
0.00012868394 |
| 1,532 |
Data Management in Machine Learning: Challenges, Techniques, and Systems |
2017 |
SIGMOD |
0.00011472681 |
| 2,194 |
Enabling and Optimizing Non-linear Feature Interactions in Factorized Linear Algebra |
2019 |
SIGMOD |
9.3138337e-05 |
| 2,863 |
Incremental and Approximate Inference for Faster Occlusion-based Deep CNN Explanations |
2019 |
SIGMOD |
7.9877991e-05 |
| 2,886 |
VISTA: Optimized System for Declarative Feature Transfer from Deep CNNs at Scale |
2020 |
SIGMOD |
7.9612767e-05 |
| 2,915 |
Brainwash: A Data System for Feature Engineering |
2013 |
CIDR |
7.9078385e-05 |
| 3,206 |
Panorama: A Data System for Unbounded Vocabulary Querying over Video |
2020 |
VLDB |
7.3826363e-05 |
| 3,948 |
A Comparative Evaluation of Systems for Scalable Linear Algebra-based Analytics |
2018 |
VLDB |
6.5959084e-05 |
| 4,557 |
Distributed Deep Learning on Data Systems: A Comparative Analysis of Approaches |
2021 |
VLDB |
6.087611e-05 |
| 4,785 |
Demonstration of Santoku: Optimizing Machine Learning over Normalized Data |
2015 |
VLDB |
5.9236989e-05 |
| 6,538 |
Tuple-oriented Compression for Large-scale Mini-batch Stochastic Gradient Descent |
2019 |
SIGMOD |
5.023239e-05 |
| 7,273 |
Feature Selection in Enterprise Analytics: A Demonstration using an R-based Data Analytics System |
2013 |
VLDB |
4.7810804e-05 |
| 8,378 |
Probabilistic Management of OCR Data using an RDBMS |
2012 |
VLDB |
4.5320288e-05 |
| 13,313 |
Demonstration of Krypton: Optimized CNN Inference for Occlusion-based Deep CNN Explanations |
2019 |
VLDB |
- |
Semantically Similar Papers