Database Paper Browser

Back to papers

Towards Linear Algebra over Normalized Data

Summary: Introduces linear algebra over normalized data via a new logical type and rewrite rules that convert denormalized ML tasks into normalized operations. Enables automatic factorization of diverse ML algorithms on relational data, with up to 36x speedups in R/RDBMS. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
11407
Venue
VLDB
Year
2017
Pagerank
0.00012868394
Overall Rank
1,279 | 91.11%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 34 of 34 citing papers.

Rank Citing Paper Year Venue Pagerank
1,532 Data Management in Machine Learning: Challenges, Techniques, and Systems 2017 SIGMOD 0.00011472681
1,666 HELIX: Holistic Optimization for Accelerating Iterative Machine Learning 2019 VLDB 0.0001096361
2,154 DIFF: A Relational Interface for Large-Scale Data Explanation 2019 VLDB 9.4208667e-05
2,194 Enabling and Optimizing Non-linear Feature Interactions in Factorized Linear Algebra 2019 SIGMOD 9.3138337e-05
2,350 An Intermediate Representation for Optimizing Machine Learning Pipelines 2019 VLDB 8.9788641e-05
2,886 VISTA: Optimized System for Declarative Feature Transfer from Deep CNNs at Scale 2020 SIGMOD 7.9612767e-05
3,099 DB4ML – An In-Memory Database Kernel with Machine Learning Support 2020 SIGMOD 7.5642871e-05
3,148 ARM-Net: Adaptive Relation Modeling Network for Structured Data 2021 SIGMOD 7.4751269e-05
3,277 A Layered Aggregate Engine for Analytics Workloads 2019 SIGMOD 7.2871625e-05
3,727 Cost-based or Learning-based? A Hybrid Query Optimizer for Query Plan Selection 2022 VLDB 6.8141709e-05
3,948 A Comparative Evaluation of Systems for Scalable Linear Algebra-based Analytics 2018 VLDB 6.5959084e-05
4,154 Robust and Transferable Log-based Anomaly Detection 2023 SIGMOD 6.4032498e-05
4,197 Incremental View Maintenance with Triple Lock Factorization Benefits 2018 SIGMOD 6.367895e-05
5,084 In-Database Machine Learning with CorgiPile: Stochastic Gradient Descent without Full Data Shuffle 2022 SIGMOD 5.7091191e-05
6,156 Optimizing Tensor Programs on Flexible Storage 2023 SIGMOD 5.1802603e-05
6,538 Tuple-oriented Compression for Large-scale Mini-batch Stochastic Gradient Descent 2019 SIGMOD 5.023239e-05
6,541 ConnectorX: Accelerating Data Loading From Databases to Dataframes 2022 VLDB 5.0216945e-05
6,549 Demonstration of Nimbus: Model-based Pricing for Machine Learning in a Data Marketplace 2019 SIGMOD 5.0175568e-05
6,745 DistME: A Fast and Elastic Distributed Matrix Computation Engine using GPUs 2019 SIGMOD 4.9417155e-05
7,179 Coresets over Multiple Tables for Feature-rich and Data-efficient Machine Learning 2023 VLDB 4.8078895e-05
8,279 Galley: Modern Query Optimization for Sparse Tensor Programs 2025 SIGMOD 4.5435639e-05
8,595 Towards A Polyglot Framework for Factorized ML 2021 VLDB 4.4889397e-05
8,786 AWARE: Workload-aware, Redundancy-exploiting Linear Algebra 2023 SIGMOD 4.4521262e-05
8,864 Cerebro: A Layered Data Platform for Scalable Deep Learning 2021 CIDR 4.4326439e-05
8,980 HADAD: A Lightweight Approach for Optimizing Hybrid Complex Analytics Queries 2021 SIGMOD 4.4169807e-05
9,222 Towards an Optimized GROUP BY Abstraction for Large-Scale Machine Learning 2021 VLDB 4.3698672e-05
9,856 In-Database Data Imputation 2024 SIGMOD 4.269353e-05
10,122 TranSQL+: Serving Large Language Models with SQL on Low-Resource Hardware 2026 SIGMOD 4.1945683e-05
10,177 InferF: Declarative Factorization of AI/ML Inferences over Joins 2026 SIGMOD 4.1945683e-05
10,499 Privacy and Accuracy-Aware AI/ML Model Deduplication 2025 SIGMOD 4.1945683e-05
11,187 Regularized Pairwise Relationship based Analytics for Structured Data 2023 SIGMOD 4.1945683e-05
11,282 Demonstration of OpenDBML, a Framework for Democratizing In-Database Machine Learning 2023 VLDB 4.1945683e-05
11,312 Amalur: Next-generation Data Integration in Data Lakes 2022 CIDR 4.1945683e-05
11,363 Givens QR Decomposition over Relational Databases 2022 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 20 of 20 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
51 Including Group-By in Query Optimization 1994 VLDB 0.00067123727
140 The MADlib Analytics Library or MAD Skills, the SQL 2012 VLDB 0.00042270404
248 Eager Aggregation and Lazy Aggregation 1995 VLDB 0.00030785339
543 MLbase: A Distributed Machine-learning System 2013 CIDR 0.00020526854
557 SystemML: Declarative Machine Learning on Spark 2016 VLDB 0.00020197988
658 Towards a Unified Architecture for in-RDBMS Analytics 2012 SIGMOD 0.00018506577
834 Learning Linear Regression Models over Factorized Joins 2016 SIGMOD 0.00016135159
850 Scaling Factorization Machines to Relational Data 2013 VLDB 0.00015955971
903 To Join or Not to Join? Thinking Twice about Joins before Feature Selection 2016 SIGMOD 0.0001547016
1,158 Simulation of Database-Valued Markov Chains Using SimSQL 2013 SIGMOD 0.0001361064
1,167 Learning Generalized Linear Models Over Normalized Data 2015 SIGMOD 0.00013547713
1,239 A Demonstration of SciDB: A Science-Oriented DBMS 2009 VLDB 0.00013102195
1,402 Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML 2014 VLDB 0.00012180605
1,967 Compressed Linear Algebra for Large-Scale Machine Learning 2016 VLDB 9.9131712e-05
2,255 LINVIEW: Incremental View Maintenance for Complex Analytical Queries 2014 SIGMOD 9.1884983e-05
3,082 FDB: A Query Engine for Factorised Relational Databases 2012 VLDB 7.6014248e-05
3,455 A Comparison of Platforms for Implementing and Running Very Large Scale Machine Learning Algorithms 2014 SIGMOD 7.0771839e-05
4,159 F: Regression Models over Factorized Views 2016 VLDB 6.3993326e-05
4,505 SPOOF: Sum-Product Optimization and Operator Fusion for Large-Scale Machine Learning 2017 CIDR 6.1327108e-05
4,785 Demonstration of Santoku: Optimizing Machine Learning over Normalized Data 2015 VLDB 5.9236989e-05
Previous Page 1 / 1 Next

Semantically Similar Papers