Database Paper Browser

Back to papers

Learning Generalized Linear Models Over Normalized Data

Summary: Factorized learning for gradient-descent GLMs over normalized data; pushes computations through joins to avoid I/O redundancy. Empirical results show factorized learning faster than join-then-learn, with a cost-based strategy; extends to multi-table joins and Hive. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
4933
Venue
SIGMOD
Year
2015
Pagerank
0.00013547713
Overall Rank
1,167 | 91.89%
DOI
10.1145/2723372.2723713

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 50 of 52 citing papers.

Rank Citing Paper Year Venue Pagerank
834 Learning Linear Regression Models over Factorized Joins 2016 SIGMOD 0.00016135159
903 To Join or Not to Join? Thinking Twice about Joins before Feature Selection 2016 SIGMOD 0.0001547016
1,279 Towards Linear Algebra over Normalized Data 2017 VLDB 0.00012868394
1,369 Random Sampling over Joins Revisited 2018 SIGMOD 0.00012339777
1,532 Data Management in Machine Learning: Challenges, Techniques, and Systems 2017 SIGMOD 0.00011472681
1,666 HELIX: Holistic Optimization for Accelerating Iterative Machine Learning 2019 VLDB 0.0001096361
1,942 Heterogeneity-aware Distributed Parameter Servers 2017 SIGMOD 0.00010012691
2,122 SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle 2020 CIDR 9.4989076e-05
2,154 DIFF: A Relational Interface for Large-Scale Data Explanation 2019 VLDB 9.4208667e-05
2,194 Enabling and Optimizing Non-linear Feature Interactions in Factorized Linear Algebra 2019 SIGMOD 9.3138337e-05
2,350 An Intermediate Representation for Optimizing Machine Learning Pipelines 2019 VLDB 8.9788641e-05
2,886 VISTA: Optimized System for Declarative Feature Transfer from Deep CNNs at Scale 2020 SIGMOD 7.9612767e-05
3,006 On Functional Aggregate Queries with Additive Inequalities 2019 PODS 7.7299363e-05
3,277 A Layered Aggregate Engine for Analytics Workloads 2019 SIGMOD 7.2871625e-05
3,407 End-to-end Optimization of Machine Learning Prediction Queries 2022 SIGMOD 7.1295646e-05
3,948 A Comparative Evaluation of Systems for Scalable Linear Algebra-based Analytics 2018 VLDB 6.5959084e-05
3,958 MLog: Towards Declarative In-Database Machine Learning 2017 VLDB 6.5897636e-05
4,033 In-RDBMS Hardware Acceleration of Advanced Analytics 2018 VLDB 6.5113267e-05
4,129 Are Key-Foreign Key Joins Safe to Avoid when Learning High-Capacity Classifiers? 2018 VLDB 6.428887e-05
4,197 Incremental View Maintenance with Triple Lock Factorization Benefits 2018 SIGMOD 6.367895e-05
4,395 Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models 2017 VLDB 6.2244283e-05
4,402 Smurf: Self-Service String Matching Using Random Forests 2019 VLDB 6.2195162e-05
4,505 SPOOF: Sum-Product Optimization and Operator Fusion for Large-Scale Machine Learning 2017 CIDR 6.1327108e-05
4,725 GeCo: Quality Counterfactual Explanations in Real Time 2021 VLDB 5.9697637e-05
4,785 Demonstration of Santoku: Optimizing Machine Learning over Normalized Data 2015 VLDB 5.9236989e-05
4,787 The Relational Data Borg is Learning 2020 VLDB 5.9224501e-05
5,084 In-Database Machine Learning with CorgiPile: Stochastic Gradient Descent without Full Data Shuffle 2022 SIGMOD 5.7091191e-05
5,806 BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees 2019 SIGMOD 5.3200643e-05
5,855 Optimal Join Algorithms Meet Top-k 2020 SIGMOD 5.3006096e-05
5,951 PGMJoins: Random Join Sampling with Graphical Models 2021 SIGMOD 5.2592385e-05
5,962 Beyond Equi-joins: Ranking, Enumeration and Factorization 2021 VLDB 5.2536266e-05
6,156 Optimizing Tensor Programs on Flexible Storage 2023 SIGMOD 5.1802603e-05
6,330 Efficient Construction of Approximate Ad-Hoc ML models Through Materialization and Reuse 2018 VLDB 5.1077416e-05
6,404 ColumnML: Column-Store Machine Learning with On-The-Fly Data Transformation 2019 VLDB 5.0786954e-05
6,538 Tuple-oriented Compression for Large-scale Mini-batch Stochastic Gradient Descent 2019 SIGMOD 5.023239e-05
6,986 A Cost-based Optimizer for Gradient Descent Optimization 2017 SIGMOD 4.8727048e-05
7,179 Coresets over Multiple Tables for Feature-rich and Data-efficient Machine Learning 2023 VLDB 4.8078895e-05
7,664 Schema Independent Relational Learning 2017 SIGMOD 4.6857329e-05
8,279 Galley: Modern Query Optimization for Sparse Tensor Programs 2025 SIGMOD 4.5435639e-05
8,595 Towards A Polyglot Framework for Factorized ML 2021 VLDB 4.4889397e-05
8,786 AWARE: Workload-aware, Redundancy-exploiting Linear Algebra 2023 SIGMOD 4.4521262e-05
8,864 Cerebro: A Layered Data Platform for Scalable Deep Learning 2021 CIDR 4.4326439e-05
9,222 Towards an Optimized GROUP BY Abstraction for Large-Scale Machine Learning 2021 VLDB 4.3698672e-05
9,437 BlockJoin: Efficient Matrix Partitioning Through Joins 2017 VLDB 4.3425552e-05
9,856 In-Database Data Imputation 2024 SIGMOD 4.269353e-05
10,003 Clustering with Set Outliers and Applications in Relational Clustering 2026 PODS 4.1945683e-05
10,177 InferF: Declarative Factorization of AI/ML Inferences over Joins 2026 SIGMOD 4.1945683e-05
10,291 Morphing-based Compression for Data-centric ML Pipelines 2026 VLDB 4.1945683e-05
10,499 Privacy and Accuracy-Aware AI/ML Model Deduplication 2025 SIGMOD 4.1945683e-05
10,924 Improved Approximation Algorithms for Relational Clustering 2024 PODS 4.1945683e-05
Previous Page 1 / 2 Next

Outgoing Citations (Sorted by Pagerank)

Showing 15 of 15 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers