Back to papers
CAPS: Cost-Aware ML Pipeline Selection
Summary: CAPS adds cost-aware pipeline selection to AutoML, orthogonal to the underlying search strategy, via lightweight time/cost estimation. Models candidate pipelines as a directed hypergraph and solves a constrained prize-collecting subset problem with a greedy approximation, cutting waste up to 4x.
(summarized by gpt-5.4-mini on May 27 2026)
- Paper ID
- 14289
- Venue
- VLDB
- Year
- 2026
- Pagerank
- 4.1945683e-05
- Overall Rank
- 10,252 | 28.68%
- DOI
-
10.14778/3801059.3801060
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
Outgoing Citations (Sorted by Pagerank)
Showing 12 of 12 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 921 |
Democratizing Data Science through Interactive Curation of ML Pipelines |
2019 |
SIGMOD |
0.00015337438 |
| 1,666 |
HELIX: Holistic Optimization for Accelerating Iterative Machine Learning |
2019 |
VLDB |
0.0001096361 |
| 2,152 |
MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis |
2018 |
SIGMOD |
9.4239787e-05 |
| 2,384 |
Oracle AutoML: A Fast and Predictive AutoML Pipeline |
2020 |
VLDB |
8.925354e-05 |
| 3,711 |
Saga: A Platform for Continuous Construction and Serving of Knowledge At Scale |
2022 |
SIGMOD |
6.823609e-05 |
| 4,774 |
LIMA: Fine-grained Lineage Tracing and Reuse in Machine Learning Systems |
2021 |
SIGMOD |
5.9316087e-05 |
| 4,957 |
Doing More with Less: Characterizing Dataset Downsampling for AutoML |
2021 |
VLDB |
5.8035715e-05 |
| 5,567 |
Optimizing Data Pipelines for Machine Learning in Feature Stores |
2023 |
VLDB |
5.4305348e-05 |
| 6,053 |
Optimizing Machine Learning Workloads in Collaborative Environments |
2020 |
SIGMOD |
5.2326838e-05 |
| 7,494 |
SubStrat: A Subset-Based Optimization Strategy for Faster AutoML |
2023 |
VLDB |
4.7180617e-05 |
| 8,177 |
DORIAN in action: Assisted Design of Data Science Pipelines |
2022 |
VLDB |
4.5673266e-05 |
| 9,231 |
Modyn: Data-Centric Machine Learning Pipeline Orchestration |
2025 |
SIGMOD |
4.3690661e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 6,986 |
A Cost-based Optimizer for Gradient Descent Optimization |
2017 |
SIGMOD |
4.8727048e-05 |
| 7,311 |
The Machine Learning Bazaar: Harnessing the ML Ecosystem for Effective System Development |
2020 |
SIGMOD |
4.7656884e-05 |
| 7,494 |
SubStrat: A Subset-Based Optimization Strategy for Faster AutoML |
2023 |
VLDB |
4.7180617e-05 |
| 2,456 |
Production Machine Learning Pipelines: Empirical Analysis and Optimization Opportunities |
2021 |
SIGMOD |
8.7733773e-05 |
| 5,429 |
DiffPrep: Differentiable Data Preprocessing Pipeline Search for Learning over Tabular Data |
2023 |
SIGMOD |
5.5087325e-05 |
| 4,957 |
Doing More with Less: Characterizing Dataset Downsampling for AutoML |
2021 |
VLDB |
5.8035715e-05 |
| 8,828 |
HAIPipe: Combining Human-generated and Machine-generated Pipelines for Data Preparation |
2023 |
SIGMOD |
4.4407488e-05 |
| 8,743 |
CtxPipe: Context-aware Data Preparation Pipeline Construction for Machine Learning |
2024 |
SIGMOD |
4.456315e-05 |
| 2,384 |
Oracle AutoML: A Fast and Predictive AutoML Pipeline |
2020 |
VLDB |
8.925354e-05 |
| 5,304 |
A Scalable AutoML Approach Based on Graph Neural Networks |
2022 |
VLDB |
5.5779335e-05 |