Back to papers
An Intermediate Representation for Optimizing Machine Learning Pipelines
Summary: Lara, a declarative DSL, provides an IR for end-to-end ML pipelines, unifying preprocessing, UDFs, control flow, and training. Monads enable cross-boundary pushdown/fusion; combinators encode domain operators to optimize data access, with up to 10× speedups on dense and sparse data.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 11847
- Venue
- VLDB
- Year
- 2019
- Pagerank
- 8.9788641e-05
- Overall Rank
- 2,350 | 83.66%
- DOI
-
10.14778/3342263.3342633
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 19 of 19 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 2,122 |
SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle |
2020 |
CIDR |
9.4989076e-05 |
| 2,804 |
Extending Relational Query Processing with ML Inference |
2020 |
CIDR |
8.0935487e-05 |
| 3,407 |
End-to-end Optimization of Machine Learning Prediction Queries |
2022 |
SIGMOD |
7.1295646e-05 |
| 3,875 |
Cloudy with High Chance of DBMS: A 10-year Prediction for Enterprise-Grade ML |
2020 |
CIDR |
6.675257e-05 |
| 4,557 |
Distributed Deep Learning on Data Systems: A Comparative Analysis of Approaches |
2021 |
VLDB |
6.087611e-05 |
| 4,774 |
LIMA: Fine-grained Lineage Tracing and Reuse in Machine Learning Systems |
2021 |
SIGMOD |
5.9316087e-05 |
| 4,957 |
Doing More with Less: Characterizing Dataset Downsampling for AutoML |
2021 |
VLDB |
5.8035715e-05 |
| 5,072 |
Optimizing Machine Learning Inference Queries with Correlative Proxy Models |
2022 |
VLDB |
5.7185674e-05 |
| 5,427 |
The NebulaStream Platform: Data and Application Management for the Internet of Things |
2020 |
CIDR |
5.509468e-05 |
| 5,731 |
Babelfish: Efficient Execution of Polyglot Queries |
2022 |
VLDB |
5.3502065e-05 |
| 6,796 |
InferDB: In-Database Machine Learning Inference Using Indexes |
2024 |
VLDB |
4.9241624e-05 |
| 7,476 |
Lachesis: Automatic Partitioning for UDF-Centric Analytics |
2021 |
VLDB |
4.7188928e-05 |
| 8,595 |
Towards A Polyglot Framework for Factorized ML |
2021 |
VLDB |
4.4889397e-05 |
| 8,980 |
HADAD: A Lightweight Approach for Optimizing Hybrid Complex Analytics Queries |
2021 |
SIGMOD |
4.4169807e-05 |
| 9,806 |
The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage Format |
2024 |
SIGMOD |
4.2805224e-05 |
| 10,177 |
InferF: Declarative Factorization of AI/ML Inferences over Joins |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,770 |
cedar: Optimized and Unified Machine Learning Input Data Pipelines |
2025 |
VLDB |
4.1945683e-05 |
| 11,339 |
Redundancy Elimination in Distributed Matrix Computation |
2022 |
SIGMOD |
4.1945683e-05 |
| 11,513 |
TraNCE: Transforming Nested Collections Efficiently |
2021 |
VLDB |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 23 of 23 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 42 |
A Comparison of Approaches to Large-Scale Data Analysis |
2009 |
SIGMOD |
0.00073498298 |
| 51 |
Including Group-By in Query Optimization |
1994 |
VLDB |
0.00067123727 |
| 60 |
Efficiently Compiling Efficient Query Plans for Modern Hardware |
2011 |
VLDB |
0.00064439773 |
| 543 |
MLbase: A Distributed Machine-learning System |
2013 |
CIDR |
0.00020526854 |
| 557 |
SystemML: Declarative Machine Learning on Spark |
2016 |
VLDB |
0.00020197988 |
| 1,167 |
Learning Generalized Linear Models Over Normalized Data |
2015 |
SIGMOD |
0.00013547713 |
| 1,279 |
Towards Linear Algebra over Normalized Data |
2017 |
VLDB |
0.00012868394 |
| 1,402 |
Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML |
2014 |
VLDB |
0.00012180605 |
| 1,750 |
Weld: A Common Runtime for High Performance Data Analytics |
2017 |
CIDR |
0.00010683647 |
| 1,967 |
Compressed Linear Algebra for Large-Scale Machine Learning |
2016 |
VLDB |
9.9131712e-05 |
| 2,172 |
Spinning Fast Iterative Data Flows |
2012 |
VLDB |
9.3706587e-05 |
| 2,611 |
Opening the Black Boxes in Data Flow Optimization |
2012 |
VLDB |
8.4536967e-05 |
| 2,747 |
Stubby: A Transformation-based Optimizer for MapReduce Workflows |
2012 |
VLDB |
8.1828918e-05 |
| 2,818 |
Implicit Parallelism through Deep Language Embedding |
2015 |
SIGMOD |
8.0665558e-05 |
| 2,838 |
How to Architect a Query Compiler, Revisited |
2018 |
SIGMOD |
8.0408472e-05 |
| 2,896 |
Evaluating End-to-End Optimization for Data Analytics Applications in Weld |
2018 |
VLDB |
7.9452051e-05 |
| 3,918 |
On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML |
2018 |
VLDB |
6.6315176e-05 |
| 3,948 |
A Comparative Evaluation of Systems for Scalable Linear Algebra-based Analytics |
2018 |
VLDB |
6.5959084e-05 |
| 4,505 |
SPOOF: Sum-Product Optimization and Operator Fusion for Large-Scale Machine Learning |
2017 |
CIDR |
6.1327108e-05 |
| 5,257 |
Probabilistic Demand Forecasting at Scale |
2017 |
VLDB |
5.6003925e-05 |
| 8,078 |
Meta-Dataflows: Efficient Exploratory Dataflow Jobs |
2018 |
SIGMOD |
4.5914967e-05 |
| 9,437 |
BlockJoin: Efficient Matrix Partitioning Through Joins |
2017 |
VLDB |
4.3425552e-05 |
| 11,840 |
Emma in Action: Declarative Dataflows for Scalable Data Analysis |
2016 |
SIGMOD |
4.1945683e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 3,698 |
Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines |
2022 |
SIGMOD |
6.8340435e-05 |
| 1,967 |
Compressed Linear Algebra for Large-Scale Machine Learning |
2016 |
VLDB |
9.9131712e-05 |
| 6,469 |
Materialization and Reuse Optimizations for Production Data Science Pipelines |
2022 |
SIGMOD |
5.0519488e-05 |
| 6,191 |
Automatic Optimization of Matrix Implementations for Distributed Machine Learning and Linear Algebra |
2021 |
SIGMOD |
5.1642282e-05 |
| 1,279 |
Towards Linear Algebra over Normalized Data |
2017 |
VLDB |
0.00012868394 |
| 6,291 |
Lightweight Inspection of Data Preprocessing in Native Machine Learning Pipelines |
2021 |
CIDR |
5.1269764e-05 |
| 4,774 |
LIMA: Fine-grained Lineage Tracing and Reuse in Machine Learning Systems |
2021 |
SIGMOD |
5.9316087e-05 |
| 3,918 |
On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML |
2018 |
VLDB |
6.6315176e-05 |
| 4,505 |
SPOOF: Sum-Product Optimization and Operator Fusion for Large-Scale Machine Learning |
2017 |
CIDR |
6.1327108e-05 |
| 5,487 |
SPORES: Sum-Product Optimization via Relational Equality Saturation for Large Scale Linear Algebra |
2020 |
VLDB |
5.4791501e-05 |