Database Paper Browser

Back to papers

An Intermediate Representation for Optimizing Machine Learning Pipelines

Summary: Lara, a declarative DSL, provides an IR for end-to-end ML pipelines, unifying preprocessing, UDFs, control flow, and training. Monads enable cross-boundary pushdown/fusion; combinators encode domain operators to optimize data access, with up to 10× speedups on dense and sparse data. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
11847
Venue
VLDB
Year
2019
Pagerank
8.9788641e-05
Overall Rank
2,350 | 83.66%
DOI
10.14778/3342263.3342633

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 19 of 19 citing papers.

Rank Citing Paper Year Venue Pagerank
2,122 SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle 2020 CIDR 9.4989076e-05
2,804 Extending Relational Query Processing with ML Inference 2020 CIDR 8.0935487e-05
3,407 End-to-end Optimization of Machine Learning Prediction Queries 2022 SIGMOD 7.1295646e-05
3,875 Cloudy with High Chance of DBMS: A 10-year Prediction for Enterprise-Grade ML 2020 CIDR 6.675257e-05
4,557 Distributed Deep Learning on Data Systems: A Comparative Analysis of Approaches 2021 VLDB 6.087611e-05
4,774 LIMA: Fine-grained Lineage Tracing and Reuse in Machine Learning Systems 2021 SIGMOD 5.9316087e-05
4,957 Doing More with Less: Characterizing Dataset Downsampling for AutoML 2021 VLDB 5.8035715e-05
5,072 Optimizing Machine Learning Inference Queries with Correlative Proxy Models 2022 VLDB 5.7185674e-05
5,427 The NebulaStream Platform: Data and Application Management for the Internet of Things 2020 CIDR 5.509468e-05
5,731 Babelfish: Efficient Execution of Polyglot Queries 2022 VLDB 5.3502065e-05
6,796 InferDB: In-Database Machine Learning Inference Using Indexes 2024 VLDB 4.9241624e-05
7,476 Lachesis: Automatic Partitioning for UDF-Centric Analytics 2021 VLDB 4.7188928e-05
8,595 Towards A Polyglot Framework for Factorized ML 2021 VLDB 4.4889397e-05
8,980 HADAD: A Lightweight Approach for Optimizing Hybrid Complex Analytics Queries 2021 SIGMOD 4.4169807e-05
9,806 The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage Format 2024 SIGMOD 4.2805224e-05
10,177 InferF: Declarative Factorization of AI/ML Inferences over Joins 2026 SIGMOD 4.1945683e-05
10,770 cedar: Optimized and Unified Machine Learning Input Data Pipelines 2025 VLDB 4.1945683e-05
11,339 Redundancy Elimination in Distributed Matrix Computation 2022 SIGMOD 4.1945683e-05
11,513 TraNCE: Transforming Nested Collections Efficiently 2021 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 23 of 23 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
42 A Comparison of Approaches to Large-Scale Data Analysis 2009 SIGMOD 0.00073498298
51 Including Group-By in Query Optimization 1994 VLDB 0.00067123727
60 Efficiently Compiling Efficient Query Plans for Modern Hardware 2011 VLDB 0.00064439773
543 MLbase: A Distributed Machine-learning System 2013 CIDR 0.00020526854
557 SystemML: Declarative Machine Learning on Spark 2016 VLDB 0.00020197988
1,167 Learning Generalized Linear Models Over Normalized Data 2015 SIGMOD 0.00013547713
1,279 Towards Linear Algebra over Normalized Data 2017 VLDB 0.00012868394
1,402 Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML 2014 VLDB 0.00012180605
1,750 Weld: A Common Runtime for High Performance Data Analytics 2017 CIDR 0.00010683647
1,967 Compressed Linear Algebra for Large-Scale Machine Learning 2016 VLDB 9.9131712e-05
2,172 Spinning Fast Iterative Data Flows 2012 VLDB 9.3706587e-05
2,611 Opening the Black Boxes in Data Flow Optimization 2012 VLDB 8.4536967e-05
2,747 Stubby: A Transformation-based Optimizer for MapReduce Workflows 2012 VLDB 8.1828918e-05
2,818 Implicit Parallelism through Deep Language Embedding 2015 SIGMOD 8.0665558e-05
2,838 How to Architect a Query Compiler, Revisited 2018 SIGMOD 8.0408472e-05
2,896 Evaluating End-to-End Optimization for Data Analytics Applications in Weld 2018 VLDB 7.9452051e-05
3,918 On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML 2018 VLDB 6.6315176e-05
3,948 A Comparative Evaluation of Systems for Scalable Linear Algebra-based Analytics 2018 VLDB 6.5959084e-05
4,505 SPOOF: Sum-Product Optimization and Operator Fusion for Large-Scale Machine Learning 2017 CIDR 6.1327108e-05
5,257 Probabilistic Demand Forecasting at Scale 2017 VLDB 5.6003925e-05
8,078 Meta-Dataflows: Efficient Exploratory Dataflow Jobs 2018 SIGMOD 4.5914967e-05
9,437 BlockJoin: Efficient Matrix Partitioning Through Joins 2017 VLDB 4.3425552e-05
11,840 Emma in Action: Declarative Dataflows for Scalable Data Analysis 2016 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Semantically Similar Papers