Database Paper Browser

Back to papers

Materialization and Reuse Optimizations for Production Data Science Pipelines

Summary: Proposes budgeted materialization to precompute and store pipeline artifacts, reducing redundant data processing in retraining ML pipelines. Introduces a DAG-based reuse planner to fuse pipelines and reuse artifacts, delivering up to 10x training speedups. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
6493
Venue
SIGMOD
Year
2022
Pagerank
5.0519488e-05
Overall Rank
6,469 | 55.00%
DOI
10.1145/3514221.3526186

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 4 of 4 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 18 of 18 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
126 Space-Efficient Online Computation of Quantile Summaries 2001 SIGMOD 0.00044744986
179 Efficient and Extensible Algorithms for Multi Query Optimization 2000 SIGMOD 0.00037672155
667 Incremental Knowledge Base Construction Using DeepDive 2015 VLDB 0.00018440557
761 Materialization Optimizations for Feature Selection Workloads 2014 SIGMOD 0.00017053783
947 MRShare: Sharing Across Multiple Queries in MapReduce 2010 VLDB 0.00015114576
977 Pipelining in Multi-Query Optimization 2001 PODS 0.0001488881
1,565 Principles of Dataset Versioning: Exploring the Recreation/Storage Tradeoff 2015 VLDB 0.00011345567
1,666 HELIX: Holistic Optimization for Accelerating Iterative Machine Learning 2019 VLDB 0.0001096361
1,788 On-the-Fly Sharing for Streamed Aggregation 2006 SIGMOD 0.00010555742
2,152 MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis 2018 SIGMOD 9.4239787e-05
2,205 ReStore: Reusing Results of MapReduce Jobs 2012 VLDB 9.2920002e-05
3,378 General Incremental Sliding-Window Aggregation 2015 VLDB 7.1622572e-05
3,703 Multi-Query Optimization in MapReduce Framework 2014 VLDB 6.8289978e-05
4,576 The Missing Piece in Complex Analytics: Low Latency, Scalable Model Management and Serving with Velox 2015 CIDR 6.0721464e-05
6,053 Optimizing Machine Learning Workloads in Collaborative Environments 2020 SIGMOD 5.2326838e-05
6,330 Efficient Construction of Approximate Ad-Hoc ML models Through Materialization and Reuse 2018 VLDB 5.1077416e-05
8,075 AJoin: Ad-hoc Stream Joins at Scale 2020 VLDB 4.5917655e-05
8,653 ApproxML: Efficient Approximate Ad-Hoc ML Models Through Materialization and Reuse 2019 VLDB 4.475291e-05
Previous Page 1 / 1 Next

Semantically Similar Papers