Back to papers
Modyn: Data-Centric Machine Learning Pipeline Orchestration
Summary: Modyn is a data-centric ML platform for growing datasets, declaratively configuring training with data-selection and triggering policies. Composite models for fair evaluation; open benchmarks; high-throughput, sample-level data selection.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 7050
- Venue
- SIGMOD
- Year
- 2025
- Pagerank
- 4.3690661e-05
- Overall Rank
- 9,231 | 35.79%
- DOI
-
10.1145/3709705
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 1 of 1 citing papers.
Outgoing Citations (Sorted by Pagerank)
Showing 12 of 12 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 44 |
The Design Of Postgres |
1986 |
SIGMOD |
0.00071838587 |
| 1,482 |
Automating Large-Scale Data Quality Verification |
2018 |
VLDB |
0.00011725533 |
| 2,170 |
tf.data: A Machine Learning Data Processing Framework |
2021 |
VLDB |
9.3821603e-05 |
| 2,456 |
Production Machine Learning Pipelines: Empirical Analysis and Optimization Opportunities |
2021 |
SIGMOD |
8.7733773e-05 |
| 2,688 |
Accelerating Recommendation System Training by Leveraging Popular Choices |
2022 |
VLDB |
8.2991144e-05 |
| 4,110 |
Learning to Validate the Predictions of Black Box Classifiers on Unseen Data |
2020 |
SIGMOD |
6.4428544e-05 |
| 4,424 |
PrIU: A Provenance-Based Approach for Incrementally Updating Regression Models |
2020 |
SIGMOD |
6.198474e-05 |
| 7,258 |
Incremental Tabular Learning on Heterogeneous Feature Space |
2023 |
SIGMOD |
4.7865674e-05 |
| 8,163 |
Capturing and Querying Fine-grained Provenance of Preprocessing Pipelines in Data Science |
2021 |
VLDB |
4.5723431e-05 |
| 8,177 |
DORIAN in action: Assisted Design of Data Science Pipelines |
2022 |
VLDB |
4.5673266e-05 |
| 8,257 |
Automating and Optimizing Data-Centric What-If Analyses on Native Machine Learning Pipelines |
2023 |
SIGMOD |
4.5487511e-05 |
| 9,118 |
Towards Observability for Production Machine Learning Pipelines |
2022 |
VLDB |
4.3928288e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 2,170 |
tf.data: A Machine Learning Data Processing Framework |
2021 |
VLDB |
9.3821603e-05 |
| 9,118 |
Towards Observability for Production Machine Learning Pipelines |
2022 |
VLDB |
4.3928288e-05 |
| 6,469 |
Materialization and Reuse Optimizations for Production Data Science Pipelines |
2022 |
SIGMOD |
5.0519488e-05 |
| 10,770 |
cedar: Optimized and Unified Machine Learning Input Data Pipelines |
2025 |
VLDB |
4.1945683e-05 |
| 2,122 |
SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle |
2020 |
CIDR |
9.4989076e-05 |
| 8,257 |
Automating and Optimizing Data-Centric What-If Analyses on Native Machine Learning Pipelines |
2023 |
SIGMOD |
4.5487511e-05 |
| 11,313 |
Towards Observability for Machine Learning Pipelines |
2022 |
CIDR |
4.1945683e-05 |
| 2,456 |
Production Machine Learning Pipelines: Empirical Analysis and Optimization Opportunities |
2021 |
SIGMOD |
8.7733773e-05 |
| 4,003 |
Data Platform for Machine Learning |
2019 |
SIGMOD |
6.54347e-05 |
| 7,311 |
The Machine Learning Bazaar: Harnessing the ML Ecosystem for Effective System Development |
2020 |
SIGMOD |
4.7656884e-05 |