Database Paper Browser

Back to papers

Optimizing Data Pipelines for Machine Learning in Feature Stores

Summary: Introduces DB-style optimizations for feature stores targeting point-in-time joins to reduce resource use and speed up ML data pipelines. Implemented in Feathr and evaluated on TPCx-AI and real retail workloads, achieving up to 3× pipeline acceleration. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID
13292
Venue
VLDB
Year
2023
Pagerank
5.4305348e-05
Overall Rank
5,567 | 61.28%
DOI
10.14778/3625054.3625060

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 3 of 3 citing papers.

Rank Citing Paper Year Venue Pagerank
9,236 The Hopsworks Feature Store for Machine Learning 2024 SIGMOD 4.3690661e-05
10,243 TPCx-AI under the Microscope: A Benchmarking Debt Analysis 2026 VLDB 4.1945683e-05
10,252 CAPS: Cost-Aware ML Pipeline Selection 2026 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 21 of 21 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
95 Maintaining Views Incrementally 1993 SIGMOD 0.00050896659
158 Automated Selection of Materialized Views and Indexes for SQL Databases 2000 VLDB 0.00040071492
481 Incremental Maintenance of Views with Duplicates 1995 SIGMOD 0.00022167223
731 Optimizing Queries Using Materialized Views: A Practical, Scalable Solution 2001 SIGMOD 0.00017468889
761 Materialization Optimizations for Feature Selection Workloads 2014 SIGMOD 0.00017053783
1,059 Answering Complex SQL Queries Using Automatic Summary Tables 2000 SIGMOD 0.00014382575
1,155 A Scalable Algorithm for Answering Queries Using Views 2000 VLDB 0.00013616518
1,911 Algorithms for Materialized View Design in Data Warehousing Environment 1997 VLDB 0.00010120234
1,922 Selecting Subexpressions to Materialize at Datacenter Scale 2018 VLDB 0.00010082599
2,401 Physical Data Independence, Constraints, and Optimization with Universal Plans 1999 VLDB 8.8954126e-05
3,875 Cloudy with High Chance of DBMS: A 10-year Prediction for Enterprise-Grade ML 2020 CIDR 6.675257e-05
4,174 Computation Reuse in Analytics Job Service at Microsoft 2018 SIGMOD 6.3856219e-05
4,966 Relative Error Streaming Quantiles 2021 PODS 5.7959749e-05
5,605 TPCx-AI - An Industry Standard Benchmark for Artificial Intelligence and Machine Learning Systems 2023 VLDB 5.4142007e-05
5,627 KLL± Approximate Quantile Sketches over Dynamic Datasets 2021 VLDB 5.403782e-05
6,228 Managing ML Pipelines: Feature Stores and the Coming Wave of Embedding Ecosystems 2021 VLDB 5.1470042e-05
6,247 Optimizing In-memory Database Engine for AI-powered On-line Decision Augmentation Using Persistent Memory 2021 VLDB 5.1389201e-05
6,469 Materialization and Reuse Optimizations for Production Data Science Pipelines 2022 SIGMOD 5.0519488e-05
8,514 UPLIFT: Parallelization Strategies for Feature Transformations in Machine Learning Workloads 2022 VLDB 4.4944285e-05
8,826 Delta: Scalable Data Dissemination under Capacity Constraints 2014 VLDB 4.441364e-05
9,344 Hippo: Sharing Computations in Hyper-Parameter Optimization 2022 VLDB 4.3539442e-05
Previous Page 1 / 1 Next

Semantically Similar Papers