Database Paper Browser

Back to papers

End-to-end Optimization of Machine Learning Prediction Queries

Summary: Raven unifies data processing and ML inference in one IR/graph to optimize prediction queries. Data-driven runtime selection and logical-to-physical transformations span CPU/GPU and ML/DNN backends, delivering up to 13x on Spark and 330x on SQL Server. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
6453
Venue
SIGMOD
Year
2022
Pagerank
7.1295646e-05
Overall Rank
3,407 | 76.30%
DOI
10.1145/3514221.3526141

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 17 of 17 citing papers.

Rank Citing Paper Year Venue Pagerank
5,476 Containerized Execution of UDFs: An Experimental Evaluation 2022 VLDB 5.4866534e-05
6,327 The Tensor Data Platform: Towards an AI-centric Database System 2023 CIDR 5.1083405e-05
6,378 Mitigating the Impedance Mismatch between Prediction Query Execution and Database Engine 2025 SIGMOD 5.0909804e-05
6,796 InferDB: In-Database Machine Learning Inference Using Indexes 2024 VLDB 4.9241624e-05
8,080 Biathlon: Harnessing Model Resilience for Accelerating ML Inference Pipelines 2024 VLDB 4.5911668e-05
8,257 Automating and Optimizing Data-Centric What-If Analyses on Native Machine Learning Pipelines 2023 SIGMOD 4.5487511e-05
8,279 Galley: Modern Query Optimization for Sparse Tensor Programs 2025 SIGMOD 4.5435639e-05
8,688 NeurDB: On the Design and Implementation of an AI-powered Autonomous Database 2025 CIDR 4.4673127e-05
8,716 nsDB: Architecting the Next Generation Database by Integrating Neural and Symbolic Systems 2024 VLDB 4.4618187e-05
9,320 Powering In-Database Dynamic Model Slicing for Structured Data Analytics 2024 VLDB 4.3556432e-05
9,476 Adda: Towards Efficient in-Database Feature Generation via LLM-based Agents 2025 SIGMOD 4.3341665e-05
9,695 Share the Tensor Tea: How Databases can Leverage the Machine Learning Ecosystem 2022 VLDB 4.3025567e-05
9,806 The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage Format 2024 SIGMOD 4.2805224e-05
10,095 NeurStore: Efficient In-database Deep Learning Model Management System 2026 SIGMOD 4.1945683e-05
10,130 MorphingDB: A Task-Centric AI-Native DBMS for Model Management and Inference 2026 SIGMOD 4.1945683e-05
10,177 InferF: Declarative Factorization of AI/ML Inferences over Joins 2026 SIGMOD 4.1945683e-05
11,277 Sniffer: A Novel Model Type Detection System against Machine-Learning-as-a-Service Platforms 2023 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 32 of 32 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
66 Spark SQL: Relational Data Processing in Spark 2015 SIGMOD 0.00061639801
140 The MADlib Analytics Library or MAD Skills, the SQL 2012 VLDB 0.00042270404
167 The Snowflake Elastic Data Warehouse 2016 SIGMOD 0.00039180521
329 Accelerating Machine Learning Inference with Probabilistic Predicates 2018 SIGMOD 0.00027249545
365 On the Power of Magic 1987 PODS 0.00025585898
557 SystemML: Declarative Machine Learning on Spark 2016 VLDB 0.00020197988
658 Towards a Unified Architecture for in-RDBMS Analytics 2012 SIGMOD 0.00018506577
746 Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores 2020 VLDB 0.00017326979
1,108 Froid: Optimization of Imperative Programs in a Relational Database 2018 VLDB 0.00013984276
1,167 Learning Generalized Linear Models Over Normalized Data 2015 SIGMOD 0.00013547713
1,532 Data Management in Machine Learning: Challenges, Techniques, and Systems 2017 SIGMOD 0.00011472681
2,170 tf.data: A Machine Learning Data Processing Framework 2021 VLDB 9.3821603e-05
2,350 An Intermediate Representation for Optimizing Machine Learning Pipelines 2019 VLDB 8.9788641e-05
2,545 POLARIS: The Distributed SQL Engine in Azure Synapse 2020 VLDB 8.5725413e-05
2,804 Extending Relational Query Processing with ML Inference 2020 CIDR 8.0935487e-05
2,896 Evaluating End-to-End Optimization for Data Analytics Applications in Weld 2018 VLDB 7.9452051e-05
2,954 Magpie: Python at Speed and Scale using Cloud Backends 2021 CIDR 7.8262582e-05
3,038 Azure Data Lake Store: A Hyperscale Distributed File Service for Big Data Analytics 2017 SIGMOD 7.6717218e-05
3,099 DB4ML – An In-Memory Database Kernel with Machine Learning Support 2020 SIGMOD 7.5642871e-05
3,293 Jointly Optimizing Preprocessing and Inference for DNN-based Visual Analytics 2021 VLDB 7.2629834e-05
3,331 A Demonstration of Willump: A Statistically-Aware End-to-end Optimizer for Machine Learning Inference 2020 VLDB 7.2131599e-05
3,875 Cloudy with High Chance of DBMS: A 10-year Prediction for Enterprise-Grade ML 2020 CIDR 6.675257e-05
3,922 Pushing Data-Induced Predicates Through Joins in Big-Data Clusters 2020 VLDB 6.6291079e-05
4,557 Distributed Deep Learning on Data Systems: A Comparative Analysis of Approaches 2021 VLDB 6.087611e-05
4,701 Tensors: An abstraction for general data processing 2021 VLDB 5.9866564e-05
4,748 Rafiki: Machine Learning as an Analytics Service System 2019 VLDB 5.9526539e-05
4,787 The Relational Data Borg is Learning 2020 VLDB 5.9224501e-05
5,487 SPORES: Sum-Product Optimization via Relational Equality Saturation for Large Scale Linear Algebra 2020 VLDB 5.4791501e-05
5,821 Tensor Relational Algebra for Distributed Machine Learning System Design 2021 VLDB 5.3134851e-05
6,291 Lightweight Inspection of Data Preprocessing in Native Machine Learning Pipelines 2021 CIDR 5.1269764e-05
6,982 Integration of Data Mining and Relational Databases 2000 VLDB 4.8734424e-05
8,444 Not Black-Box Anymore! Enabling Analytics-Aware Optimizations in Teradata Vantage 2021 VLDB 4.5118994e-05
Previous Page 1 / 1 Next

Semantically Similar Papers