Database Paper Browser

Back to papers

Towards an Optimized GROUP BY Abstraction for Large-Scale Machine Learning

Summary: Proposes grouped learning, a GROUP BY-like abstraction for ML over subgroups. Presents Gradient Accumulation Parallelism (GAP) and a hybrid task/data-parallel approach in Kingpin on Ray, delivering up to 4x–14x speedups vs. state-of-the-art. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
12410
Venue
VLDB
Year
2021
Pagerank
4.3698672e-05
Overall Rank
9,222 | 35.85%
DOI
10.14778/3476249.3476284

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 4 of 4 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 19 of 19 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
140 The MADlib Analytics Library or MAD Skills, the SQL 2012 VLDB 0.00042270404
411 PyTorch Distributed: Experiences on Accelerating Data Parallel Training 2020 VLDB 0.00023906921
557 SystemML: Declarative Machine Learning on Spark 2016 VLDB 0.00020197988
658 Towards a Unified Architecture for in-RDBMS Analytics 2012 SIGMOD 0.00018506577
683 Cerebro: A Data System for Optimized Deep Learning Model Selection 2020 VLDB 0.00018195476
834 Learning Linear Regression Models over Factorized Joins 2016 SIGMOD 0.00016135159
850 Scaling Factorization Machines to Relational Data 2013 VLDB 0.00015955971
1,167 Learning Generalized Linear Models Over Normalized Data 2015 SIGMOD 0.00013547713
1,279 Towards Linear Algebra over Normalized Data 2017 VLDB 0.00012868394
1,391 Ease.ml: Towards Multi-tenant Resource Sharing for Machine Learning Workloads 2018 VLDB 0.0001223506
1,402 Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML 2014 VLDB 0.00012180605
2,122 SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle 2020 CIDR 9.4989076e-05
2,194 Enabling and Optimizing Non-linear Feature Interactions in Factorized Linear Algebra 2019 SIGMOD 9.3138337e-05
3,918 On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML 2018 VLDB 6.6315176e-05
4,159 F: Regression Models over Factorized Views 2016 VLDB 6.3993326e-05
4,785 Demonstration of Santoku: Optimizing Machine Learning over Normalized Data 2015 VLDB 5.9236989e-05
4,975 An Experimental Evaluation of Large Scale GBDT Systems 2019 VLDB 5.79026e-05
8,864 Cerebro: A Layered Data Platform for Scalable Deep Learning 2021 CIDR 4.4326439e-05
9,117 Ease.ml in Action: Towards Multi-tenant Declarative Learning Services 2018 VLDB 4.3928617e-05
Previous Page 1 / 1 Next

Semantically Similar Papers