Back to papers
cedar: Optimized and Unified Machine Learning Input Data Pipelines
Summary: Cedar: a unified programming framework for ML input pipelines exposing composable operators usable across ML frameworks and libraries. Its extensible optimizer systematically applies fusion/scheduling and orchestrates local/distributed execution to meet throughput, achieving 1.87–10.65× speedups vs. state-of-the-art.
(summarized by gpt-5-mini on Feb 09 2026)
- Paper ID
- 14092
- Venue
- VLDB
- Year
- 2025
- Pagerank
- 4.1945683e-05
- Overall Rank
- 10,770 | 25.08%
- DOI
-
10.14778/3705829.3705861
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
Outgoing Citations (Sorted by Pagerank)
Showing 13 of 13 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 66 |
Spark SQL: Relational Data Processing in Spark |
2015 |
SIGMOD |
0.00061639801 |
| 167 |
The Snowflake Elastic Data Warehouse |
2016 |
SIGMOD |
0.00039180521 |
| 202 |
LINQ: Reconciling Objects, Relations and XML in the .NET Framework |
2006 |
SIGMOD |
0.00034920912 |
| 538 |
The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing |
2015 |
VLDB |
0.00020678804 |
| 746 |
Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores |
2020 |
VLDB |
0.00017326979 |
| 1,504 |
Analyzing and Mitigating Data Stalls in DNN Training |
2021 |
VLDB |
0.00011642333 |
| 2,170 |
tf.data: A Machine Learning Data Processing Framework |
2021 |
VLDB |
9.3821603e-05 |
| 2,350 |
An Intermediate Representation for Optimizing Machine Learning Pipelines |
2019 |
VLDB |
8.9788641e-05 |
| 2,611 |
Opening the Black Boxes in Data Flow Optimization |
2012 |
VLDB |
8.4536967e-05 |
| 3,698 |
Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines |
2022 |
SIGMOD |
6.8340435e-05 |
| 4,180 |
FastFlow: Accelerating Deep Learning Model Training with Smart Offloading of Input Data Pipeline |
2023 |
VLDB |
6.3793352e-05 |
| 5,552 |
GoldMiner: Elastic Scaling of Training Data Pre-Processing Pipelines for Deep Learning |
2023 |
SIGMOD |
5.4402488e-05 |
| 8,348 |
FusionFlow: Accelerating Data Preprocessing for Machine Learning with CPU-GPU Cooperation |
2024 |
VLDB |
4.5410024e-05 |
Semantically Similar Papers