Database Paper Browser

Back to papers

Pushing Data-Induced Predicates Through Joins in Big-Data Clusters

Summary: Data-induced predicates (diPs) translate table predicates into joining-table predicates via data statistics to extend predicate pushdown across joins. Zone-maps with a slightly larger statistic enable plan-time data skipping, with ~50% of queries skipping ≥33% of input and median query time ~2× faster on TPC-H, TPC-DS, and JoinOrder. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
12130
Venue
VLDB
Year
2020
Pagerank
6.6291079e-05
Overall Rank
3,922 | 72.72%
DOI
10.14778/3368289.3368292

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 21 of 21 citing papers.

Rank Citing Paper Year Venue Pagerank
2,916 Quantifying TPC-H Choke Points and Their Optimizations 2020 VLDB 7.9068048e-05
3,407 End-to-end Optimization of Machine Learning Prediction Queries 2022 SIGMOD 7.1295646e-05
3,779 Instance-Optimized Data Layouts for Cloud Analytics Workloads 2021 SIGMOD 6.7747205e-05
5,765 Predicate Transfer: Efficient Pre-Filtering on Multi-Join Queries 2024 CIDR 5.336442e-05
6,149 Crystal: A Unified Cache Storage System for Analytical Databases 2021 VLDB 5.1847534e-05
6,261 The Cosmos Big Data Platform at Microsoft: Over a Decade of Progress and a Decade to Look Forward 2021 VLDB 5.1350714e-05
6,466 Pando: Enhanced Data Skipping with Logical Data Partitioning 2023 VLDB 5.0528281e-05
7,011 Simple Adaptive Query Processing vs. Learned Query Optimizers: Observations and Analysis 2023 VLDB 4.8629458e-05
7,283 Sia: Optimizing Queries using Learned Predicates 2021 SIGMOD 4.7764688e-05
7,427 Selection Pushdown in Column Stores using Bit Manipulation Instructions 2023 SIGMOD 4.7327406e-05
7,836 NOCAP: Near-Optimal Correlation-Aware Partitioning Joins 2023 SIGMOD 4.6380835e-05
8,222 Sieve: A Learned Data-Skipping Index for Data Analytics 2023 VLDB 4.5555621e-05
8,415 Pruning in Snowflake: Working Smarter, Not Harder 2025 SIGMOD 4.5197687e-05
8,502 Conditional Cuckoo Filters 2021 SIGMOD 4.4972336e-05
8,645 Predicate Pushdown for Data Science Pipelines 2023 SIGMOD 4.4772518e-05
8,758 Hyperspace: The Indexing Subsystem of Azure Synapse 2021 VLDB 4.456315e-05
8,781 Accelerate Distributed Joins with Predicate Transfer 2025 SIGMOD 4.4534753e-05
9,798 Threshold Queries in Theory and in the Wild 2022 VLDB 4.2818172e-05
10,404 Dynamic Pruning for Recursive Joins 2025 SIGMOD 4.1945683e-05
10,950 PLAQUE: Automated Predicate Learning at Query Time 2024 SIGMOD 4.1945683e-05
11,212 SH2O: Efficient Data Access for Work-Sharing Databases 2023 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 41 of 41 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
3 Pig Latin: A Not-So-Foreign Language for Data Processing 2008 SIGMOD 0.0024183614
16 MAGIC SETS AND OTHER STRANGE WAYS TO IMPLEMENT LOGIC PROGRAMS (Extended Abstract) 1986 PODS 0.0010066783
22 SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets 2008 VLDB 0.0008456613
66 Spark SQL: Relational Data Processing in Spark 2015 SIGMOD 0.00061639801
70 Hive - A Warehousing Solution Over a Map-Reduce Framework 2009 VLDB 0.00059533166
71 How Good Are Query Optimizers, Really? 2016 VLDB 0.00059038975
158 Automated Selection of Materialized Views and Indexes for SQL Databases 2000 VLDB 0.00040071492
224 CORDS: Automatic Discovery of Correlations and Soft Functional Dependencies 2004 SIGMOD 0.00032746205
269 Fast Incremental Maintenance of Approximate Histograms 1997 VLDB 0.00029656549
310 The Vertica Analytic Database: C-Store 7 Years Later 2012 VLDB 0.00028132402
402 Mergeable Summaries 2012 PODS 0.00024196343
408 Database Cracking 2007 CIDR 0.00023953844
529 Self-tuning Histograms: Building Histograms Without Looking at Data 1999 SIGMOD 0.00020828852
661 Database Tuning Advisor for Microsoft SQL Server 2005 2004 VLDB 0.00018481174
779 Materialized View Maintenance and Integrity Constraint Checking: Trading Space for Time 1996 SIGMOD 0.00016786961
906 F1: A Distributed SQL Database That Scales 2013 VLDB 0.00015448884
1,169 SuRF: Practical Range Query Filtering with Fast Succinct Tries 2018 SIGMOD 0.00013536447
1,302 Query Optimization by Predicate Move-Around 1994 VLDB 0.00012705525
1,313 Cost-Based Optimization for Magic: Algebra and Implementation 1996 SIGMOD 0.0001263831
1,477 Fine-grained Partitioning for Aggressive Data Skipping 2014 SIGMOD 0.00011770865
1,499 Apache Hadoop Goes Realtime at Facebook 2011 SIGMOD 0.00011675192
1,582 Execution Strategies for SQL Subqueries 2007 SIGMOD 0.00011265079
1,939 From Theory to Practice: Efficient Join Query Evaluation in a Parallel Database System 2015 SIGMOD 0.00010025655
1,974 BHUNT: Automatic Discovery of Fuzzy Algebraic Constraints in Relational Data 2003 VLDB 9.8866171e-05
2,222 Efficient View Maintenance at Data Warehouses 1997 SIGMOD 9.2592356e-05
2,439 CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop 2011 VLDB 8.8190594e-05
2,444 Brighthouse: An Analytic Data Warehouse for Ad-hoc Queries 2008 VLDB 8.8076551e-05
2,772 Quickstep: A Data Platform Based on the Scaling-Up Approach 2018 VLDB 8.1401661e-05
2,837 Correlation Maps: A Compressed Access Method for Exploiting Soft Functional Dependencies 2009 VLDB 8.0414149e-05
2,987 The Uncracked Pieces in Database Cracking 2014 VLDB 7.7787088e-05
3,608 Column Sketches: A Scan Accelerator for Rapid and Robust Predicate Evaluation 2018 SIGMOD 6.924272e-05
3,737 Skipping-oriented Partitioning for Columnar Layouts 2017 VLDB 6.8033227e-05
3,821 Locality-aware Partitioning in Parallel Database Systems 2015 SIGMOD 6.7281515e-05
3,891 Slalom: Coasting Through Raw Data via Adaptive Partitioning and Indexing 2017 VLDB 6.659442e-05
3,912 Two Birds, One Stone: A Fast, yet Lightweight, Indexing Scheme for Modern Database Systems 2017 VLDB 6.6354964e-05
4,199 Implementation of Magic-sets in a Relational Database System 1994 SIGMOD 6.3662839e-05
5,118 AdaptDB: Adaptive Partitioning for Distributed Joins 2017 VLDB 5.6820984e-05
7,053 Statisticum: Data Statistics Management in SAP HANA 2017 VLDB 4.8497195e-05
8,066 Optimizing Iceberg Queries with Complex Joins 2017 SIGMOD 4.5937212e-05
8,979 High Performance Stream Query Processing With Correlation-Aware Partitioning 2014 VLDB 4.4170433e-05
9,801 Amoeba: A Shape changing Storage System for Big Data 2016 VLDB 4.2815507e-05
Previous Page 1 / 1 Next

Semantically Similar Papers