Managing ML Pipelines: Feature Stores and the Coming Wave of Embedding Ecosystems
Summary: Feature stores for ML pipelines expand from traditional tabular features toward embedding ecosystems. It pinpoints embedding-specific gaps—training data management, embedding quality assessment, and downstream monitoring—that standard feature stores don’t cover, and surveys candidate solutions. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Laurel Orr
- 2. Atindriyo Sanyal
- 3. Xiao Ling
- 4. Karan Goel
- 5. Megan Leszczynski
Incoming Citations (Sorted by Pagerank)
Showing 5 of 5 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 4,934 | From BERT to GPT-3 Codex: Harnessing the Potential of Very Large Language Models for Data Management | 2022 | VLDB | 5.8198826e-05 |
| 5,567 | Optimizing Data Pipelines for Machine Learning in Feature Stores | 2023 | VLDB | 5.4305348e-05 |
| 8,182 | SHiFT: An Efficient, Flexible Search Engine for Transfer Learning | 2023 | VLDB | 4.5659133e-05 |
| 8,514 | UPLIFT: Parallelization Strategies for Feature Transformations in Machine Learning Workloads | 2022 | VLDB | 4.4944285e-05 |
| 9,364 | FEBench: A Benchmark for Real-Time Relational Data Feature Extraction | 2023 | VLDB | 4.3502487e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 5 of 5 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 254 | Snorkel: Rapid Training Data Creation with Weak Supervision | 2018 | VLDB | 0.00030540555 |
| 300 | Deep Learning for Entity Matching: A Design Space Exploration | 2018 | SIGMOD | 0.00028441466 |
| 1,463 | ARDA: Automatic Relational Data Augmentation for Machine Learning | 2020 | VLDB | 0.00011869295 |
| 4,196 | Overton: A Data System for Monitoring and Improving Machine-Learned Products | 2020 | CIDR | 6.3686231e-05 |
| 9,438 | Bootleg: Chasing the Tail with Self-Supervised Named Entity Disambiguation | 2021 | CIDR | 4.3425082e-05 |
Previous
Page 1 / 1
Next