Database Paper Browser

Back to papers

In-Database Machine Learning with CorgiPile: Stochastic Gradient Descent without Full Data Shuffle

Summary: Proposes CorgiPile, a hierarchical data shuffling method for in-database SGD that avoids full shuffles yet preserves convergence. Systematic study of existing shuffles, convergence theory, and PostgreSQL integration via three new operators; achieves 1.6–12.8× speedups over MADlib/Bismarck on HDD/SSD. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
6462
Venue
SIGMOD
Year
2022
Pagerank
5.7091191e-05
Overall Rank
5,084 | 64.64%
DOI
10.1145/3514221.3526150

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 5 of 5 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 18 of 18 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
140 The MADlib Analytics Library or MAD Skills, the SQL 2012 VLDB 0.00042270404
658 Towards a Unified Architecture for in-RDBMS Analytics 2012 SIGMOD 0.00018506577
683 Cerebro: A Data System for Optimized Deep Learning Model Selection 2020 VLDB 0.00018195476
834 Learning Linear Regression Models over Factorized Joins 2016 SIGMOD 0.00016135159
850 Scaling Factorization Machines to Relational Data 2013 VLDB 0.00015955971
1,044 DimmWitted: A Study of Main-Memory Statistical Analytics 2014 VLDB 0.00014475229
1,158 Simulation of Database-Valued Markov Chains Using SimSQL 2013 SIGMOD 0.0001361064
1,167 Learning Generalized Linear Models Over Normalized Data 2015 SIGMOD 0.00013547713
1,279 Towards Linear Algebra over Normalized Data 2017 VLDB 0.00012868394
1,942 Heterogeneity-aware Distributed Parameter Servers 2017 SIGMOD 0.00010012691
2,642 Vertica-ML: Distributed Machine Learning in Vertica Database 2020 SIGMOD 8.3851878e-05
3,099 DB4ML – An In-Memory Database Kernel with Machine Learning Support 2020 SIGMOD 7.5642871e-05
4,159 F: Regression Models over Factorized Views 2016 VLDB 6.3993326e-05
4,557 Distributed Deep Learning on Data Systems: A Comparative Analysis of Approaches 2021 VLDB 6.087611e-05
5,821 Tensor Relational Algebra for Distributed Machine Learning System Design 2021 VLDB 5.3134851e-05
6,191 Automatic Optimization of Matrix Implementations for Distributed Machine Learning and Linear Algebra 2021 SIGMOD 5.1642282e-05
6,404 ColumnML: Column-Store Machine Learning with On-The-Fly Data Transformation 2019 VLDB 5.0786954e-05
9,706 Distributed Numerical and Machine Learning Computations via Two-Phase Execution of Aggregated Join Trees 2021 VLDB 4.2992942e-05
Previous Page 1 / 1 Next

Semantically Similar Papers