Back to papers
Scheduling Data Processing Pipelines for Incremental Training on MLP-based Recommendation Models
Summary: Intra-pipeline prefetching hides feature processing during embedding lookup in incremental MLP training on CPU-limited clusters. Inter-pipeline scheduling overlaps critical comms with idle non-critical tasks, yielding 1.36x speedup via RECS on TensorFlow.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 7117
- Venue
- SIGMOD
- Year
- 2025
- Pagerank
- 4.456315e-05
- Overall Rank
- 8,737 | 39.22%
- DOI
-
10.1145/3722212.3724454
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 1 of 1 citing papers.
Outgoing Citations (Sorted by Pagerank)
Showing 13 of 13 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 1,044 |
DimmWitted: A Study of Main-Memory Statistical Analytics |
2014 |
VLDB |
0.00014475229 |
| 1,504 |
Analyzing and Mitigating Data Stalls in DNN Training |
2021 |
VLDB |
0.00011642333 |
| 2,170 |
tf.data: A Machine Learning Data Processing Framework |
2021 |
VLDB |
9.3821603e-05 |
| 2,456 |
Production Machine Learning Pipelines: Empirical Analysis and Optimization Opportunities |
2021 |
SIGMOD |
8.7733773e-05 |
| 2,677 |
HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework |
2022 |
VLDB |
8.3268401e-05 |
| 2,688 |
Accelerating Recommendation System Training by Leveraging Popular Choices |
2022 |
VLDB |
8.2991144e-05 |
| 3,698 |
Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines |
2022 |
SIGMOD |
6.8340435e-05 |
| 4,180 |
FastFlow: Accelerating Deep Learning Model Training with Smart Offloading of Input Data Pipeline |
2023 |
VLDB |
6.3793352e-05 |
| 5,052 |
HET-GMP: A Graph-based System Approach to Scaling Large Embedding Model Training |
2022 |
SIGMOD |
5.7337977e-05 |
| 5,552 |
GoldMiner: Elastic Scaling of Training Data Pre-Processing Pipelines for Deep Learning |
2023 |
SIGMOD |
5.4402488e-05 |
| 6,469 |
Materialization and Reuse Optimizations for Production Data Science Pipelines |
2022 |
SIGMOD |
5.0519488e-05 |
| 8,045 |
MultiBiSage: A Web-Scale Recommendation System Using Multiple Bipartite Graphs at Pinterest |
2023 |
VLDB |
4.5990229e-05 |
| 9,094 |
FEC: Efficient Deep Recommendation Model Training with Flexible Embedding Communication |
2023 |
SIGMOD |
4.3980444e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 4,180 |
FastFlow: Accelerating Deep Learning Model Training with Smart Offloading of Input Data Pipeline |
2023 |
VLDB |
6.3793352e-05 |
| 4,439 |
TencentRec: Real-time Stream Recommendation in Practice |
2015 |
SIGMOD |
6.1885354e-05 |
| 1,790 |
StreamRec: A Real-Time Recommender System |
2011 |
SIGMOD |
0.00010551363 |
| 6,469 |
Materialization and Reuse Optimizations for Production Data Science Pipelines |
2022 |
SIGMOD |
5.0519488e-05 |
| 2,170 |
tf.data: A Machine Learning Data Processing Framework |
2021 |
VLDB |
9.3821603e-05 |
| 3,698 |
Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines |
2022 |
SIGMOD |
6.8340435e-05 |
| 5,993 |
DLRover-RM: Resource Optimization for Deep Recommendation Models Training in the Cloud |
2024 |
VLDB |
5.2415551e-05 |
| 9,094 |
FEC: Efficient Deep Recommendation Model Training with Flexible Embedding Communication |
2023 |
SIGMOD |
4.3980444e-05 |
| 10,532 |
IncrCP: Decomposing and Orchestrating Incremental Checkpoints for Effective Recommendation Model Training |
2025 |
VLDB |
4.1945683e-05 |
| 2,688 |
Accelerating Recommendation System Training by Leveraging Popular Choices |
2022 |
VLDB |
8.2991144e-05 |