Data Management Opportunities for Foundation Models
Summary: Argues foundation models pivot ML from model-centric engineering to data-centric pipelines where the differentiator is training corpora, making data lifecycle management (collection, integration, curation, monitoring) the core DB problem. Highlights DB research opportunities and production challenges. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Authors
- 1. Laurel Orr
- 2. Karan Goel
- 3. Christopher RĂ©
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 4 of 4 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,420 | Data Management Challenges in Production Machine Learning | 2017 | SIGMOD | 0.00012057956 |
| 1,482 | Automating Large-Scale Data Quality Verification | 2018 | VLDB | 0.00011725533 |
| 1,940 | SliceLine: Fast, Linear-Algebra-based Slice Finding for ML Model Debugging | 2021 | SIGMOD | 0.00010020173 |
| 9,438 | Bootleg: Chasing the Tail with Self-Supervised Named Entity Disambiguation | 2021 | CIDR | 4.3425082e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 10,842 | ML-Asset Management: Curation, Discovery, and Utilization | 2025 | VLDB | 4.1945683e-05 |
| 8,637 | Machine Learning for Data Management: Problems and Solutions | 2018 | SIGMOD | 4.479892e-05 |
| 1,420 | Data Management Challenges in Production Machine Learning | 2017 | SIGMOD | 0.00012057956 |
| 11,629 | Leveraging Organizational Resources to Adapt Models to New Data Modalities | 2020 | VLDB | 4.1945683e-05 |
| 1,532 | Data Management in Machine Learning: Challenges, Techniques, and Systems | 2017 | SIGMOD | 0.00011472681 |
| 4,003 | Data Platform for Machine Learning | 2019 | SIGMOD | 6.54347e-05 |
| 6,228 | Managing ML Pipelines: Feature Stores and the Coming Wave of Embedding Ecosystems | 2021 | VLDB | 5.1470042e-05 |
| 517 | Can Foundation Models Wrangle Your Data? | 2023 | VLDB | 0.00021169035 |
| 2,456 | Production Machine Learning Pipelines: Empirical Analysis and Optimization Opportunities | 2021 | SIGMOD | 8.7733773e-05 |
| 8,847 | Towards Foundation Database Models | 2025 | CIDR | 4.4371897e-05 |