Database Paper Browser

Back to papers

HELIX: Holistic Optimization for Accelerating Iterative Machine Learning

Summary: Holistic optimization across iterative ML workflows: caching/reusing intermediates or recomputing them via a Scala DSL for data prep, model spec, and learning. Reuse is a MAX-FLOW problem; caching is NP-hard but mitigated by lightweight heuristics, delivering up to 19x speedups over DeepDive/KeystoneML on NLP, CV, and science tasks. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
11978
Venue
VLDB
Year
2019
Pagerank
0.0001096361
Overall Rank
1,666 | 88.42%
DOI
10.14778/3297753.3297763

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 29 of 29 citing papers.

Rank Citing Paper Year Venue Pagerank
1,427 Towards Scalable Dataframe Systems 2020 VLDB 0.0001204248
2,122 SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle 2020 CIDR 9.4989076e-05
2,456 Production Machine Learning Pipelines: Empirical Analysis and Optimization Opportunities 2021 SIGMOD 8.7733773e-05
3,023 Helix: Accelerating Human-in-the-loop Machine Learning 2018 VLDB 7.6929986e-05
3,393 Lux: Always-on Visualization Recommendations for Exploratory Dataframe Workflows 2022 VLDB 7.1483239e-05
3,625 Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings 2020 SIGMOD 6.9055212e-05
4,557 Distributed Deep Learning on Data Systems: A Comparative Analysis of Approaches 2021 VLDB 6.087611e-05
4,774 LIMA: Fine-grained Lineage Tracing and Reuse in Machine Learning Systems 2021 SIGMOD 5.9316087e-05
4,935 OmniFair: A Declarative System for Model-Agnostic Group Fairness in Machine Learning 2021 SIGMOD 5.8198727e-05
4,957 Doing More with Less: Characterizing Dataset Downsampling for AutoML 2021 VLDB 5.8035715e-05
6,000 DeepEverest: Accelerating Declarative Top-K Queries for Deep Neural Network Interpretation 2022 VLDB 5.2415551e-05
6,053 Optimizing Machine Learning Workloads in Collaborative Environments 2020 SIGMOD 5.2326838e-05
6,469 Materialization and Reuse Optimizations for Production Data Science Pipelines 2022 SIGMOD 5.0519488e-05
6,733 Hindsight Logging for Model Training 2021 VLDB 4.9467666e-05
7,482 Provenance-Enabled Explainable AI 2024 SIGMOD 4.7180617e-05
7,656 Nautilus: An Optimized System for Deep Transfer Learning over Evolving Training Datasets 2022 SIGMOD 4.6871575e-05
7,704 ExDRa: Exploratory Data Science on Federated Raw Data 2021 SIGMOD 4.6733838e-05
8,092 Saga: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications 2023 SIGMOD 4.587921e-05
8,257 Automating and Optimizing Data-Centric What-If Analyses on Native Machine Learning Pipelines 2023 SIGMOD 4.5487511e-05
8,514 UPLIFT: Parallelization Strategies for Feature Transformations in Machine Learning Workloads 2022 VLDB 4.4944285e-05
9,223 Intermittent Human-in-the-Loop Model Selection using Cerebro: A Demonstration 2021 VLDB 4.3698672e-05
9,344 Hippo: Sharing Computations in Hyper-Parameter Optimization 2022 VLDB 4.3539442e-05
9,806 The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage Format 2024 SIGMOD 4.2805224e-05
9,912 ElasticNotebook: Enabling Live Migration for Computational Notebooks 2024 VLDB 4.2565279e-05
10,252 CAPS: Cost-Aware ML Pipeline Selection 2026 VLDB 4.1945683e-05
10,338 Flow with FlorDB: Incremental Context Maintenance for the Machine Learning Lifecycle 2025 CIDR 4.1945683e-05
10,469 Alsatian: Optimizing Model Search for Deep Transfer Learning 2025 SIGMOD 4.1945683e-05
11,476 Enforcing Constraints for Machine Learning Systems via Declarative Feature Selection: An Experimental Study 2021 SIGMOD 4.1945683e-05
11,691 Enabling Data Science for the Majority 2019 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 19 of 19 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
3 Pig Latin: A Not-So-Foreign Language for Data Processing 2008 SIGMOD 0.0024183614
37 Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud 2012 VLDB 0.0007522744
66 Spark SQL: Relational Data Processing in Spark 2015 SIGMOD 0.00061639801
202 LINQ: Reconciling Objects, Relations and XML in the .NET Framework 2006 SIGMOD 0.00034920912
254 Snorkel: Rapid Training Data Creation with Weak Supervision 2018 VLDB 0.00030540555
761 Materialization Optimizations for Feature Selection Workloads 2014 SIGMOD 0.00017053783
903 To Join or Not to Join? Thinking Twice about Joins before Feature Selection 2016 SIGMOD 0.0001547016
1,167 Learning Generalized Linear Models Over Normalized Data 2015 SIGMOD 0.00013547713
1,279 Towards Linear Algebra over Normalized Data 2017 VLDB 0.00012868394
1,413 VisTrails: Visualization meets Data Management 2006 SIGMOD 0.00012121257
1,750 Weld: A Common Runtime for High Performance Data Analytics 2017 CIDR 0.00010683647
1,873 An Architecture for Compiling UDF-centric Workflows 2015 VLDB 0.00010253002
1,922 Selecting Subexpressions to Materialize at Datacenter Scale 2018 VLDB 0.00010082599
2,152 MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis 2018 SIGMOD 9.4239787e-05
2,205 ReStore: Reusing Results of MapReduce Jobs 2012 VLDB 9.2920002e-05
2,915 Brainwash: A Data System for Feature Engineering 2013 CIDR 7.9078385e-05
3,023 Helix: Accelerating Human-in-the-loop Machine Learning 2018 VLDB 7.6929986e-05
4,159 F: Regression Models over Factorized Views 2016 VLDB 6.3993326e-05
4,857 The "Big Data" Ecosystem at LinkedIn 2013 SIGMOD 5.8736144e-05
Previous Page 1 / 1 Next

Semantically Similar Papers