Back to papers
Optimizing Data Pipelines for Machine Learning in Feature Stores
Summary: Introduces DB-style optimizations for feature stores targeting point-in-time joins to reduce resource use and speed up ML data pipelines. Implemented in Feathr and evaluated on TPCx-AI and real retail workloads, achieving up to 3× pipeline acceleration.
(summarized by gpt-5-mini on Feb 09 2026)
- Paper ID
- 13292
- Venue
- VLDB
- Year
- 2023
- Pagerank
- 5.4305348e-05
- Overall Rank
- 5,567 | 61.28%
- DOI
-
10.14778/3625054.3625060
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 3 of 3 citing papers.
Outgoing Citations (Sorted by Pagerank)
Showing 21 of 21 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 95 |
Maintaining Views Incrementally |
1993 |
SIGMOD |
0.00050896659 |
| 158 |
Automated Selection of Materialized Views and Indexes for SQL Databases |
2000 |
VLDB |
0.00040071492 |
| 481 |
Incremental Maintenance of Views with Duplicates |
1995 |
SIGMOD |
0.00022167223 |
| 731 |
Optimizing Queries Using Materialized Views: A Practical, Scalable Solution |
2001 |
SIGMOD |
0.00017468889 |
| 761 |
Materialization Optimizations for Feature Selection Workloads |
2014 |
SIGMOD |
0.00017053783 |
| 1,059 |
Answering Complex SQL Queries Using Automatic Summary Tables |
2000 |
SIGMOD |
0.00014382575 |
| 1,155 |
A Scalable Algorithm for Answering Queries Using Views |
2000 |
VLDB |
0.00013616518 |
| 1,911 |
Algorithms for Materialized View Design in Data Warehousing Environment |
1997 |
VLDB |
0.00010120234 |
| 1,922 |
Selecting Subexpressions to Materialize at Datacenter Scale |
2018 |
VLDB |
0.00010082599 |
| 2,401 |
Physical Data Independence, Constraints, and Optimization with Universal Plans |
1999 |
VLDB |
8.8954126e-05 |
| 3,875 |
Cloudy with High Chance of DBMS: A 10-year Prediction for Enterprise-Grade ML |
2020 |
CIDR |
6.675257e-05 |
| 4,174 |
Computation Reuse in Analytics Job Service at Microsoft |
2018 |
SIGMOD |
6.3856219e-05 |
| 4,966 |
Relative Error Streaming Quantiles |
2021 |
PODS |
5.7959749e-05 |
| 5,605 |
TPCx-AI - An Industry Standard Benchmark for Artificial Intelligence and Machine Learning Systems |
2023 |
VLDB |
5.4142007e-05 |
| 5,627 |
KLL± Approximate Quantile Sketches over Dynamic Datasets |
2021 |
VLDB |
5.403782e-05 |
| 6,228 |
Managing ML Pipelines: Feature Stores and the Coming Wave of Embedding Ecosystems |
2021 |
VLDB |
5.1470042e-05 |
| 6,247 |
Optimizing In-memory Database Engine for AI-powered On-line Decision Augmentation Using Persistent Memory |
2021 |
VLDB |
5.1389201e-05 |
| 6,469 |
Materialization and Reuse Optimizations for Production Data Science Pipelines |
2022 |
SIGMOD |
5.0519488e-05 |
| 8,514 |
UPLIFT: Parallelization Strategies for Feature Transformations in Machine Learning Workloads |
2022 |
VLDB |
4.4944285e-05 |
| 8,826 |
Delta: Scalable Data Dissemination under Capacity Constraints |
2014 |
VLDB |
4.441364e-05 |
| 9,344 |
Hippo: Sharing Computations in Hyper-Parameter Optimization |
2022 |
VLDB |
4.3539442e-05 |
Semantically Similar Papers