Demonstration of Santoku: Optimizing Machine Learning over Normalized Data
Summary: Demonstrates Santoku, toolkit optimizing ML on normalized data with factorized learning and auto-decisions to denormalize or push via joins. Leverages FDs to surface feature insights and ships as an R library for ML on normalized data. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Arun Kumar
- 2. Mona Jalal
- 3. Boqun Yan
- 4. Jeffrey Naughton
- 5. Jignesh M. Patel
Incoming Citations (Sorted by Pagerank)
Showing 9 of 9 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,279 | Towards Linear Algebra over Normalized Data | 2017 | VLDB | 0.00012868394 |
| 1,532 | Data Management in Machine Learning: Challenges, Techniques, and Systems | 2017 | SIGMOD | 0.00011472681 |
| 2,194 | Enabling and Optimizing Non-linear Feature Interactions in Factorized Linear Algebra | 2019 | SIGMOD | 9.3138337e-05 |
| 4,129 | Are Key-Foreign Key Joins Safe to Avoid when Learning High-Capacity Classifiers? | 2018 | VLDB | 6.428887e-05 |
| 4,159 | F: Regression Models over Factorized Views | 2016 | VLDB | 6.3993326e-05 |
| 7,179 | Coresets over Multiple Tables for Feature-rich and Data-efficient Machine Learning | 2023 | VLDB | 4.8078895e-05 |
| 8,595 | Towards A Polyglot Framework for Factorized ML | 2021 | VLDB | 4.4889397e-05 |
| 8,864 | Cerebro: A Layered Data Platform for Scalable Deep Learning | 2021 | CIDR | 4.4326439e-05 |
| 9,222 | Towards an Optimized GROUP BY Abstraction for Large-Scale Machine Learning | 2021 | VLDB | 4.3698672e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 4 of 4 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 761 | Materialization Optimizations for Feature Selection Workloads | 2014 | SIGMOD | 0.00017053783 |
| 1,167 | Learning Generalized Linear Models Over Normalized Data | 2015 | SIGMOD | 0.00013547713 |
| 2,915 | Brainwash: A Data System for Feature Engineering | 2013 | CIDR | 7.9078385e-05 |
| 7,273 | Feature Selection in Enterprise Analytics: A Demonstration using an R-based Data Analytics System | 2013 | VLDB | 4.7810804e-05 |
Previous
Page 1 / 1
Next