Database Paper Browser

Back to papers

Predicate Pushdown for Data Science Pipelines

Summary: MagicPush uses a search-verification approach to predicate pushdown in data science pipelines, discovering input-space predicates and proving pushdown preserves outputs, even with non-relational operators and UDFs. Evaluations on TPC-H and 200 real-world GitHub Notebook pipelines show it beats a strong rule-based baseline, discovers new pushdown opportunities, and yields up to 99% running-time reduction in 42 pipelines while matching baseline opportunities elsewhere. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
6639
Venue
SIGMOD
Year
2023
Pagerank
4.4772518e-05
Overall Rank
8,645 | 39.86%
DOI
10.1145/3589281

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 4 of 4 citing papers.

Rank Citing Paper Year Venue Pagerank
9,763 The UDFBench Benchmark for General-purpose UDF Queries 2025 VLDB 4.2856106e-05
10,152 Data-Semantics-Aware Recommendation of Diverse Pivot Tables 2026 SIGMOD 4.1945683e-05
10,404 Dynamic Pruning for Recursive Joins 2025 SIGMOD 4.1945683e-05
10,854 LiquidCache: Efficient Pushdown Caching for Cloud-Native Data Analytics 2025 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 27 of 27 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
66 Spark SQL: Relational Data Processing in Spark 2015 SIGMOD 0.00061639801
139 Predicate Migration: Optimizing Queries with Expensive Predicates 1993 SIGMOD 0.00042299329
335 Optimization of Real Conjunctive Queries 1993 PODS 0.00027036073
1,108 Froid: Optimization of Imperative Programs in a Relational Database 2018 VLDB 0.00013984276
1,203 PIVOT and UNPIVOT: Optimization and Execution Strategies in an RDBMS 2004 VLDB 0.00013320373
1,302 Query Optimization by Predicate Move-Around 1994 VLDB 0.00012705525
1,611 Qd-tree: Learning Data Layouts for Big Data Analytics 2020 SIGMOD 0.00011147324
1,882 Tuplex: Data Science in Python at Native Code Speed 2021 SIGMOD 0.0001021625
2,127 SQL-on-Hadoop: Full Circle Back to Shared-Nothing Database Architectures 2014 VLDB 9.4863172e-05
2,596 WeTune: Automatic Discovery and Verification of Query Rewrite Rules 2022 SIGMOD 8.4729982e-05
2,819 Mison: A Fast JSON Parser for Data Analytics 2017 VLDB 8.0651326e-05
2,916 Quantifying TPC-H Choke Points and Their Optimizations 2020 VLDB 7.9068048e-05
2,954 Magpie: Python at Speed and Scale using Cloud Backends 2021 CIDR 7.8262582e-05
3,152 AnalyticDB: Real-time OLAP Database System at Alibaba Cloud 2019 VLDB 7.4711766e-05
3,252 Auto-Suggest: Learning-to-Recommend Data Preparation Steps Using Data Science Notebooks 2020 SIGMOD 7.3178277e-05
3,432 Demonstration of the Cosette Automated SQL Prover 2017 SIGMOD 7.1008151e-05
3,901 Automated Verification of Query Equivalence Using Satisfiability Modulo Theories 2019 VLDB 6.6499845e-05
3,922 Pushing Data-Induced Predicates Through Joins in Big-Data Clusters 2020 VLDB 6.6291079e-05
4,648 Aggify: Lifting the Curse of Cursor Loops using Custom Aggregates 2020 SIGMOD 6.0247446e-05
4,667 FlexPushdownDB: Hybrid Pushdown and Caching in a Cloud DBMS 2021 VLDB 6.0116919e-05
4,677 Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications 2018 SIGMOD 6.0047822e-05
6,149 Crystal: A Unified Cache Storage System for Analytical Databases 2021 VLDB 5.1847534e-05
6,673 Incorporating Super-Operators in Big-Data Query Optimizers 2020 VLDB 4.966799e-05
6,701 YeSQL: “You extend SQL” with Rich and Highly Performant User-Defined Functions in Relational Databases 2022 VLDB 4.9561066e-05
7,283 Sia: Optimizing Queries using Learned Predicates 2021 SIGMOD 4.7764688e-05
7,342 Optimizing Recursive Queries with Program Synthesis 2022 SIGMOD 4.7576316e-05
9,819 Generating Application-Specific Data Layouts for In-memory Databases 2019 VLDB 4.2774401e-05
Previous Page 1 / 1 Next

Semantically Similar Papers