cedar: Optimized and Unified Machine Learning Input Data Pipelines

Summary: Cedar: a unified programming framework for ML input pipelines exposing composable operators usable across ML frameworks and libraries. Its extensible optimizer systematically applies fusion/scheduling and orchestrates local/distributed execution to meet throughput, achieving 1.87–10.65× speedups vs. state-of-the-art. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID: 14093
Venue: VLDB
Year: 2025
Pagerank: 4.1905499e-05
Overall Rank: 10,776 | 25.11%
DOI: 10.14778/3705829.3705861

Incoming Non-self Citations Over Time

No non-self incoming citations found for this paper in this database.

Authors

Incoming Citations (Sorted by Pagerank)

Showing 0 of 0 citing papers.

Rank	Citing Paper	Year	Venue	Pagerank

Outgoing Citations (Sorted by Pagerank)

Showing 13 of 13 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank	Cited Paper	Year	Venue	Pagerank
66	Spark SQL: Relational Data Processing in Spark	2015	SIGMOD	0.00061707583
167	The Snowflake Elastic Data Warehouse	2016	SIGMOD	0.00039408116
201	LINQ: Reconciling Objects, Relations and XML in the .NET Framework	2006	SIGMOD	0.00034903121
536	The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing	2015	VLDB	0.00020651621
739	Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores	2020	VLDB	0.00017365933
1,500	Analyzing and Mitigating Data Stalls in DNN Training	2021	VLDB	0.00011636174
2,175	tf.data: A Machine Learning Data Processing Framework	2021	VLDB	9.3745231e-05
2,355	An Intermediate Representation for Optimizing Machine Learning Pipelines	2019	VLDB	8.9727612e-05
2,616	Opening the Black Boxes in Data Flow Optimization	2012	VLDB	8.4457819e-05
3,694	Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines	2022	SIGMOD	6.8316905e-05
4,175	FastFlow: Accelerating Deep Learning Model Training with Smart Offloading of Input Data Pipeline	2023	VLDB	6.3772575e-05
5,560	GoldMiner: Elastic Scaling of Training Data Pre-Processing Pipelines for Deep Learning	2023	SIGMOD	5.4350242e-05
8,343	FusionFlow: Accelerating Data Preprocessing for Machine Learning with CPU-GPU Cooperation	2024	VLDB	4.5366487e-05

Semantically Similar Papers

Overall Rank	Paper	Year	Venue	Pagerank
6,464	Materialization and Reuse Optimizations for Production Data Science Pipelines	2022	SIGMOD	5.0471003e-05
684	Cerebro: A Data System for Optimized Deep Learning Model Selection	2020	VLDB	0.00018152321
9,234	Modyn: Data-Centric Machine Learning Pipeline Orchestration	2025	SIGMOD	4.3648789e-05
3,694	Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines	2022	SIGMOD	6.8316905e-05
4,006	Data Platform for Machine Learning	2019	SIGMOD	6.5371762e-05
4,779	LIMA: Fine-grained Lineage Tracing and Reuse in Machine Learning Systems	2021	SIGMOD	5.9259373e-05
8,117	SDPipe: A Semi-Decentralized Framework for Heterogeneity-aware Pipeline-parallel Training	2023	VLDB	4.5788485e-05
7,303	DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines	2022	CIDR	4.7632836e-05
10,409	Sequoia: An Accessible and Extensible Framework for Privacy-Preserving Machine Learning over Distributed Data	2025	SIGMOD	4.1905499e-05
2,175	tf.data: A Machine Learning Data Processing Framework	2021	VLDB	9.3745231e-05