Data Platform for Machine Learning
Summary: MLdp is a purpose-built data management platform for ML datasets, with a minimalist data model, versioning, and data provenance for reproducible experiments. Distinct from MLaaS, it integrates with major ML frameworks and adds privacy/audit controls. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Pulkit Agrawal
- 2. Rajat Arya
- 3. Aanchal Bindal
- 4. Sandeep Bhatia
- 5. Anupriya Gagneja
- 6. Joseph Godlewski
- 7. Yucheng Low
- 8. Timothy Muss
- 9. Mudit Manu Paliwal
- 10. Sethu Raman
- 11. Vishrut Shah
- 12. Bochao Shen
- 13. Laura Sugden
- 14. Kaiyu Zhao
- 15. Ming-Chuan Wu
Incoming Citations (Sorted by Pagerank)
Showing 5 of 5 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3,508 | spade: Synthesizing Data Quality Assertions for Large Language Model Pipelines | 2024 | VLDB | 7.0271496e-05 |
| 4,196 | Overton: A Data System for Monitoring and Improving Machine-Learned Products | 2020 | CIDR | 6.3686231e-05 |
| 8,163 | Capturing and Querying Fine-grained Provenance of Preprocessing Pipelines in Data Science | 2021 | VLDB | 4.5723431e-05 |
| 9,118 | Towards Observability for Production Machine Learning Pipelines | 2022 | VLDB | 4.3928288e-05 |
| 11,149 | Git is for Data | 2023 | CIDR | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 7 of 7 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 543 | MLbase: A Distributed Machine-learning System | 2013 | CIDR | 0.00020526854 |
| 557 | SystemML: Declarative Machine Learning on Spark | 2016 | VLDB | 0.00020197988 |
| 610 | Goods: Organizing Google's Datasets | 2016 | SIGMOD | 0.00019232674 |
| 1,281 | DataHub: Collaborative Data Science & Dataset Version Management at Scale | 2015 | CIDR | 0.00012854744 |
| 2,430 | Decibel: The Relational Dataset Branching System | 2016 | VLDB | 8.8330417e-05 |
| 5,271 | ORPHEUSDB: A Lightweight Approach to Relational Dataset Versioning | 2017 | SIGMOD | 5.5941385e-05 |
| 7,745 | Crossing the finish line faster when paddling the Data Lake with KAYAK | 2017 | VLDB | 4.6618625e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 11,317 | Data Management Opportunities for Foundation Models | 2022 | CIDR | 4.1945683e-05 |
| 4,557 | Distributed Deep Learning on Data Systems: A Comparative Analysis of Approaches | 2021 | VLDB | 6.087611e-05 |
| 9,776 | Structure-Aware Machine Learning over Multi-Relational Databases | 2021 | SIGMOD | 4.2856106e-05 |
| 2,122 | SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle | 2020 | CIDR | 9.4989076e-05 |
| 7,411 | ItemSuggest: A Data Management Platform for Machine Learned Ranking Services | 2019 | CIDR | 4.7364436e-05 |
| 9,236 | The Hopsworks Feature Store for Machine Learning | 2024 | SIGMOD | 4.3690661e-05 |
| 9,118 | Towards Observability for Production Machine Learning Pipelines | 2022 | VLDB | 4.3928288e-05 |
| 11,313 | Towards Observability for Machine Learning Pipelines | 2022 | CIDR | 4.1945683e-05 |
| 543 | MLbase: A Distributed Machine-learning System | 2013 | CIDR | 0.00020526854 |
| 1,532 | Data Management in Machine Learning: Challenges, Techniques, and Systems | 2017 | SIGMOD | 0.00011472681 |