Database Paper Browser

Back to papers

On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML

Summary: Exact, cost-based fusion optimization for large-scale ML in SystemML; handles complex operator DAGs and hybrid local/distributed execution. Integrated with candidate exploration and code generation for dense, sparse, and compressed data; up to 22x speedups with negligible compilation overhead. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
11657
Venue
VLDB
Year
2018
Pagerank
6.6315176e-05
Overall Rank
3,918 | 72.75%
DOI
10.14778/3229863.3229865

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 22 of 22 citing papers.

Rank Citing Paper Year Venue Pagerank
2,122 SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle 2020 CIDR 9.4989076e-05
2,350 An Intermediate Representation for Optimizing Machine Learning Pipelines 2019 VLDB 8.9788641e-05
4,557 Distributed Deep Learning on Data Systems: A Comparative Analysis of Approaches 2021 VLDB 6.087611e-05
4,774 LIMA: Fine-grained Lineage Tracing and Reuse in Machine Learning Systems 2021 SIGMOD 5.9316087e-05
4,833 MNC: Structure-Exploiting Sparsity Estimation for Matrix Expressions 2019 SIGMOD 5.8916346e-05
5,487 SPORES: Sum-Product Optimization via Relational Equality Saturation for Large Scale Linear Algebra 2020 VLDB 5.4791501e-05
6,156 Optimizing Tensor Programs on Flexible Storage 2023 SIGMOD 5.1802603e-05
7,306 DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines 2022 CIDR 4.7678574e-05
8,157 TOD: GPU-accelerated Outlier Detection via Tensor Operations 2023 VLDB 4.5730908e-05
8,262 FuseME: Distributed Matrix Computation Engine based on Cuboid-based Fused Operator and Plan Generation 2022 SIGMOD 4.5467867e-05
8,514 UPLIFT: Parallelization Strategies for Feature Transformations in Machine Learning Workloads 2022 VLDB 4.4944285e-05
8,583 Efficient Execution of User-Defined Functions in SQL Queries 2023 VLDB 4.4919445e-05
8,620 PreVision: An Out-of-Core Matrix Computation System with Optimal Buffer Replacement 2024 SIGMOD 4.4837361e-05
8,786 AWARE: Workload-aware, Redundancy-exploiting Linear Algebra 2023 SIGMOD 4.4521262e-05
8,980 HADAD: A Lightweight Approach for Optimizing Hybrid Complex Analytics Queries 2021 SIGMOD 4.4169807e-05
9,222 Towards an Optimized GROUP BY Abstraction for Large-Scale Machine Learning 2021 VLDB 4.3698672e-05
9,326 BladeDISC: Optimizing Dynamic Shape Machine Learning Workloads via Compiler Approach 2023 SIGMOD 4.3556432e-05
9,379 GIO: Generating Efficient Matrix and Frame Readers for Custom Data Formats by Example 2023 SIGMOD 4.3462787e-05
9,763 The UDFBench Benchmark for General-purpose UDF Queries 2025 VLDB 4.2856106e-05
10,571 Quantum Data Management in the NISQ Era 2025 VLDB 4.1945683e-05
11,339 Redundancy Elimination in Distributed Matrix Computation 2022 SIGMOD 4.1945683e-05
11,472 Hybrid Evaluation for Distributed Iterative Matrix Computation 2021 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 41 of 41 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
16 MAGIC SETS AND OTHER STRANGE WAYS TO IMPLEMENT LOGIC PROGRAMS (Extended Abstract) 1986 PODS 0.0010066783
60 Efficiently Compiling Efficient Query Plans for Modern Hardware 2011 VLDB 0.00064439773
66 Spark SQL: Relational Data Processing in Spark 2015 SIGMOD 0.00061639801
99 On the Propagation of Errors in the Size of Join Results 1991 SIGMOD 0.00050022914
168 MAD Skills: New Analysis Practices for Big Data 2009 VLDB 0.00038946305
359 Self-Driving Database Management Systems 2017 CIDR 0.0002592783
557 SystemML: Declarative Machine Learning on Spark 2016 VLDB 0.00020197988
704 Building Efficient Query Engines in a High-Level Language 2014 VLDB 0.00017900583
1,263 Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation 2016 SIGMOD 0.00012982857
1,299 The DataPath System: A Data-Centric Analytic Processing Engine for Large Data Warehouses 2010 SIGMOD 0.00012751522
1,341 Dynamic Programming Strikes Back 2008 SIGMOD 0.00012486285
1,402 Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML 2014 VLDB 0.00012180605
1,429 A Scalable, Predictable Join Operator for Highly Concurrent Data Warehouses 2009 VLDB 0.00012033518
1,476 Efficient Exploitation of Similar Subexpressions for Query Processing 2007 SIGMOD 0.00011779092
1,619 Adaptive Optimization of Very Large Join Queries 2018 SIGMOD 0.00011111678
1,730 Conditioning Probabilistic Databases 2008 VLDB 0.00010736755
1,750 Weld: A Common Runtime for High Performance Data Analytics 2017 CIDR 0.00010683647
1,826 Analysis of Two Existing and One New Dynamic Programming Algorithm for the Generation of Optimal Bushy Join Trees without Cross Products 2006 VLDB 0.00010400425
1,864 Relaxed Operator Fusion for In-Memory Databases: Making Compilation, Vectorization, and Prefetching Work Together At Last 2018 VLDB 0.00010280966
1,873 An Architecture for Compiling UDF-centric Workflows 2015 VLDB 0.00010253002
1,967 Compressed Linear Algebra for Large-Scale Machine Learning 2016 VLDB 9.9131712e-05
2,014 Voodoo - A Vector Algebra for Portable Database Performance on Modern Hardware 2016 VLDB 9.7904029e-05
2,287 Pipelined Query Processing in Coprocessor Environments 2018 SIGMOD 9.0972606e-05
2,383 How to Architect a Query Compiler 2016 SIGMOD 8.9294108e-05
2,410 Scalable Join Processing on Very Large RDF Graphs 2009 SIGMOD 8.8773796e-05
2,418 Tupleware: "Big" Data, Big Analytics, Small Clusters 2015 CIDR 8.8556595e-05
2,667 Cumulon: Optimizing Statistical Data Analysis in the Cloud 2013 SIGMOD 8.3413995e-05
2,747 Stubby: A Transformation-based Optimizer for MapReduce Workflows 2012 VLDB 8.1828918e-05
2,818 Implicit Parallelism through Deep Language Embedding 2015 SIGMOD 8.0665558e-05
2,848 Exploiting Matrix Dependency for Efficient Distributed Matrix Computation 2015 SIGMOD 8.0208832e-05
2,925 Shared Workload Optimization 2014 VLDB 7.888494e-05
3,284 Configuration-Parametric Query Optimization for Physical Design Tuning 2008 SIGMOD 7.2790444e-05
3,462 Efficient and Provable Multi-Query Optimization 2017 PODS 7.0703696e-05
3,666 Bypassing Joins in Disjunctive Queries 1995 VLDB 6.8618006e-05
4,326 Fast Queries Over Heterogeneous Data Through Engine Customization 2016 VLDB 6.288323e-05
4,505 SPOOF: Sum-Product Optimization and Operator Fusion for Large-Scale Machine Learning 2017 CIDR 6.1327108e-05
4,738 Query Simplification: Graceful Degradation for Join-Order Optimization 2009 SIGMOD 5.9600502e-05
4,802 Resource Elasticity for Large-Scale Machine Learning 2015 SIGMOD 5.9114415e-05
5,272 Micro-architectural Analysis of In-memory OLTP 2016 SIGMOD 5.5937875e-05
7,823 Measuring and Optimizing Distributed Array Programs 2016 VLDB 4.6419393e-05
7,878 DBToaster: Agile Views in a Dynamic Data Management System 2011 CIDR 4.6295401e-05
Previous Page 1 / 1 Next

Semantically Similar Papers