Back to papers
In-Database Machine Learning with CorgiPile: Stochastic Gradient Descent without Full Data Shuffle
Summary: Proposes CorgiPile, a hierarchical data shuffling method for in-database SGD that avoids full shuffles yet preserves convergence. Systematic study of existing shuffles, convergence theory, and PostgreSQL integration via three new operators; achieves 1.6–12.8× speedups over MADlib/Bismarck on HDD/SSD.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 6462
- Venue
- SIGMOD
- Year
- 2022
- Pagerank
- 5.7091191e-05
- Overall Rank
- 5,084 | 64.64%
- DOI
-
10.1145/3514221.3526150
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 5 of 5 citing papers.
Outgoing Citations (Sorted by Pagerank)
Showing 18 of 18 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 140 |
The MADlib Analytics Library or MAD Skills, the SQL |
2012 |
VLDB |
0.00042270404 |
| 658 |
Towards a Unified Architecture for in-RDBMS Analytics |
2012 |
SIGMOD |
0.00018506577 |
| 683 |
Cerebro: A Data System for Optimized Deep Learning Model Selection |
2020 |
VLDB |
0.00018195476 |
| 834 |
Learning Linear Regression Models over Factorized Joins |
2016 |
SIGMOD |
0.00016135159 |
| 850 |
Scaling Factorization Machines to Relational Data |
2013 |
VLDB |
0.00015955971 |
| 1,044 |
DimmWitted: A Study of Main-Memory Statistical Analytics |
2014 |
VLDB |
0.00014475229 |
| 1,158 |
Simulation of Database-Valued Markov Chains Using SimSQL |
2013 |
SIGMOD |
0.0001361064 |
| 1,167 |
Learning Generalized Linear Models Over Normalized Data |
2015 |
SIGMOD |
0.00013547713 |
| 1,279 |
Towards Linear Algebra over Normalized Data |
2017 |
VLDB |
0.00012868394 |
| 1,942 |
Heterogeneity-aware Distributed Parameter Servers |
2017 |
SIGMOD |
0.00010012691 |
| 2,642 |
Vertica-ML: Distributed Machine Learning in Vertica Database |
2020 |
SIGMOD |
8.3851878e-05 |
| 3,099 |
DB4ML – An In-Memory Database Kernel with Machine Learning Support |
2020 |
SIGMOD |
7.5642871e-05 |
| 4,159 |
F: Regression Models over Factorized Views |
2016 |
VLDB |
6.3993326e-05 |
| 4,557 |
Distributed Deep Learning on Data Systems: A Comparative Analysis of Approaches |
2021 |
VLDB |
6.087611e-05 |
| 5,821 |
Tensor Relational Algebra for Distributed Machine Learning System Design |
2021 |
VLDB |
5.3134851e-05 |
| 6,191 |
Automatic Optimization of Matrix Implementations for Distributed Machine Learning and Linear Algebra |
2021 |
SIGMOD |
5.1642282e-05 |
| 6,404 |
ColumnML: Column-Store Machine Learning with On-The-Fly Data Transformation |
2019 |
VLDB |
5.0786954e-05 |
| 9,706 |
Distributed Numerical and Machine Learning Computations via Two-Phase Execution of Aggregated Join Trees |
2021 |
VLDB |
4.2992942e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 1,504 |
Analyzing and Mitigating Data Stalls in DNN Training |
2021 |
VLDB |
0.00011642333 |
| 7,008 |
Is Your Learned Query Optimizer Behaving As You Expect? A Machine Learning Perspective |
2024 |
VLDB |
4.8643538e-05 |
| 7,061 |
Serving Deep Learning Models with Deduplication from Relational Databases |
2022 |
VLDB |
4.8463881e-05 |
| 8,220 |
PerfGuard: Deploying ML-for-Systems without Performance Regressions, Almost! |
2021 |
VLDB |
4.5557328e-05 |
| 6,538 |
Tuple-oriented Compression for Large-scale Mini-batch Stochastic Gradient Descent |
2019 |
SIGMOD |
5.023239e-05 |
| 9,222 |
Towards an Optimized GROUP BY Abstraction for Large-Scale Machine Learning |
2021 |
VLDB |
4.3698672e-05 |
| 9,776 |
Structure-Aware Machine Learning over Multi-Relational Databases |
2021 |
SIGMOD |
4.2856106e-05 |
| 7,179 |
Coresets over Multiple Tables for Feature-rich and Data-efficient Machine Learning |
2023 |
VLDB |
4.8078895e-05 |
| 6,404 |
ColumnML: Column-Store Machine Learning with On-The-Fly Data Transformation |
2019 |
VLDB |
5.0786954e-05 |
| 4,395 |
Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models |
2017 |
VLDB |
6.2244283e-05 |