Optimizing Machine Learning Workloads in Collaborative Environments
Summary: Introduces Experiment Graph (EG) to persist artifacts (data/models) as vertices and ML operations as edges for collaborative ML workloads. Proposes two materialization strategies and a linear-time reuse algorithm to cache artifacts and plan execution, yielding up to 10x speedups on repeats and ~50% on edits. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
Incoming Citations (Sorted by Pagerank)
Showing 10 of 10 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 10 of 10 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 761 | Materialization Optimizations for Feature Selection Workloads | 2014 | SIGMOD | 0.00017053783 |
| 921 | Democratizing Data Science through Interactive Curation of ML Pipelines | 2019 | SIGMOD | 0.00015337438 |
| 1,281 | DataHub: Collaborative Data Science & Dataset Version Management at Scale | 2015 | CIDR | 0.00012854744 |
| 1,565 | Principles of Dataset Versioning: Exploring the Recreation/Storage Tradeoff | 2015 | VLDB | 0.00011345567 |
| 1,666 | HELIX: Holistic Optimization for Accelerating Iterative Machine Learning | 2019 | VLDB | 0.0001096361 |
| 2,152 | MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis | 2018 | SIGMOD | 9.4239787e-05 |
| 2,205 | ReStore: Reusing Results of MapReduce Jobs | 2012 | VLDB | 9.2920002e-05 |
| 2,269 | Ground: A Data Context Service | 2017 | CIDR | 9.147379e-05 |
| 3,023 | Helix: Accelerating Human-in-the-loop Machine Learning | 2018 | VLDB | 7.6929986e-05 |
| 6,981 | Dataset Relationship Management | 2019 | CIDR | 4.8743957e-05 |
Previous
Page 1 / 1
Next