Database Paper Browser

Back to papers

Learning Linear Regression Models over Factorized Joins

Summary: Learning linear regression on training data defined by arbitrary joins using factorized representations. Proposes F/FDB, F, F/SQL to factorize cofactors, decouple gradient updates from convergence, and exploit join/union commutativity; factorized joins can be exponentially cheaper, delivering up to 1000x speedups over MADlib, StatsModels, and R. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
5130
Venue
SIGMOD
Year
2016
Pagerank
0.00016135159
Overall Rank
834 | 94.20%
DOI
10.1145/2882903.2882939

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 50 of 56 citing papers.

Rank Citing Paper Year Venue Pagerank
1,056 The Dynamic Yannakakis Algorithm: Compact and Efficient Query Processing Under Updates 2017 SIGMOD 0.0001441128
1,279 Towards Linear Algebra over Normalized Data 2017 VLDB 0.00012868394
1,369 Random Sampling over Joins Revisited 2018 SIGMOD 0.00012339777
1,532 Data Management in Machine Learning: Challenges, Techniques, and Systems 2017 SIGMOD 0.00011472681
2,122 SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle 2020 CIDR 9.4989076e-05
2,194 Enabling and Optimizing Non-linear Feature Interactions in Factorized Linear Algebra 2019 SIGMOD 9.3138337e-05
2,501 DBEst: Revisiting Approximate Query Processing Engines with Machine Learning Models 2019 SIGMOD 8.6453446e-05
2,886 VISTA: Optimized System for Declarative Feature Transfer from Deep CNNs at Scale 2020 SIGMOD 7.9612767e-05
3,099 DB4ML – An In-Memory Database Kernel with Machine Learning Support 2020 SIGMOD 7.5642871e-05
3,277 A Layered Aggregate Engine for Analytics Workloads 2019 SIGMOD 7.2871625e-05
3,878 Data Canopy: Accelerating Exploratory Statistical Analysis 2017 SIGMOD 6.6731435e-05
3,958 MLog: Towards Declarative In-Database Machine Learning 2017 VLDB 6.5897636e-05
4,129 Are Key-Foreign Key Joins Safe to Avoid when Learning High-Capacity Classifiers? 2018 VLDB 6.428887e-05
4,159 F: Regression Models over Factorized Views 2016 VLDB 6.3993326e-05
4,197 Incremental View Maintenance with Triple Lock Factorization Benefits 2018 SIGMOD 6.367895e-05
4,395 Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models 2017 VLDB 6.2244283e-05
4,402 Smurf: Self-Service String Matching Using Random Forests 2019 VLDB 6.2195162e-05
4,505 SPOOF: Sum-Product Optimization and Operator Fusion for Large-Scale Machine Learning 2017 CIDR 6.1327108e-05
4,613 F-IVM: Learning over Fast-Evolving Relational Data 2020 SIGMOD 6.0478676e-05
4,787 The Relational Data Borg is Learning 2020 VLDB 5.9224501e-05
5,084 In-Database Machine Learning with CorgiPile: Stochastic Gradient Descent without Full Data Shuffle 2022 SIGMOD 5.7091191e-05
5,487 SPORES: Sum-Product Optimization via Relational Equality Saturation for Large Scale Linear Algebra 2020 VLDB 5.4791501e-05
5,576 Conjunctive Queries with Inequalities Under Updates 2018 VLDB 5.426344e-05
5,806 BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees 2019 SIGMOD 5.3200643e-05
5,855 Optimal Join Algorithms Meet Top-k 2020 SIGMOD 5.3006096e-05
5,951 PGMJoins: Random Join Sampling with Graphical Models 2021 SIGMOD 5.2592385e-05
5,955 LMFAO: An Engine for Batches of Group-By Aggregates 2020 VLDB 5.2572882e-05
5,962 Beyond Equi-joins: Ranking, Enumeration and Factorization 2021 VLDB 5.2536266e-05
6,077 The Fast and the Private: Task-based Dataset Search 2024 CIDR 5.2229324e-05
6,538 Tuple-oriented Compression for Large-scale Mini-batch Stochastic Gradient Descent 2019 SIGMOD 5.023239e-05
7,076 Mining Approximate Acyclic Schemes from Relations 2020 SIGMOD 4.8426354e-05
7,179 Coresets over Multiple Tables for Feature-rich and Data-efficient Machine Learning 2023 VLDB 4.8078895e-05
7,491 Saibot: A Differentially Private Data Search Platform 2023 VLDB 4.7180617e-05
7,920 JoinBoost: Grow Trees Over Normalized Data Using Only SQL 2023 VLDB 4.6163888e-05
8,026 ADOPT: Adaptively Optimizing Attribute Orders for Worst-Case Optimal Join Algorithms via Reinforcement Learning 2023 VLDB 4.6030518e-05
8,279 Galley: Modern Query Optimization for Sparse Tensor Programs 2025 SIGMOD 4.5435639e-05
8,589 Output-Optimal Algorithms for Join-Aggregate Queries 2025 PODS 4.4897014e-05
8,595 Towards A Polyglot Framework for Factorized ML 2021 VLDB 4.4889397e-05
8,786 AWARE: Workload-aware, Redundancy-exploiting Linear Algebra 2023 SIGMOD 4.4521262e-05
9,222 Towards an Optimized GROUP BY Abstraction for Large-Scale Machine Learning 2021 VLDB 4.3698672e-05
9,391 Database as Runtime: Compiling LLMs to SQL for In-database Model Serving 2025 SIGMOD 4.3441378e-05
9,469 DimBoost: Boosting Gradient Boosting Decision Tree to Higher Dimensions 2018 SIGMOD 4.3342363e-05
9,486 Quantifying the Loss of Acyclic Join Dependencies 2023 PODS 4.3341665e-05
9,849 Reptile: Aggregation-level Explanations for Hierarchical Data 2022 SIGMOD 4.2721228e-05
9,856 In-Database Data Imputation 2024 SIGMOD 4.269353e-05
10,003 Clustering with Set Outliers and Applications in Relational Clustering 2026 PODS 4.1945683e-05
10,177 InferF: Declarative Factorization of AI/ML Inferences over Joins 2026 SIGMOD 4.1945683e-05
10,269 Database Views as Explanations for Relational Deep Learning 2026 VLDB 4.1945683e-05
10,291 Morphing-based Compression for Data-centric ML Pipelines 2026 VLDB 4.1945683e-05
10,339 A Lower Bound on Unambiguous Context Free Grammars via Communication Complexity 2025 PODS 4.1945683e-05
Previous Page 1 / 2 Next

Outgoing Citations (Sorted by Pagerank)

Showing 15 of 15 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers