Database Paper Browser

Back to papers

Opening the Black Boxes in Data Flow Optimization

Summary: Optimizes data flows with black-box UDFs by extracting a few operator properties to determine safe reordering, via static analysis of UDF code. The optimizer enables selection reordering, bushy join ordering, and limited aggregation push-down for non-relational flows, achieving relational-like rewrite power without relying on operator semantics. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
10363
Venue
VLDB
Year
2012
Pagerank
8.4536967e-05
Overall Rank
2,611 | 81.84%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 21 of 21 citing papers.

Rank Citing Paper Year Venue Pagerank
1,402 Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML 2014 VLDB 0.00012180605
1,882 Tuplex: Data Science in Python at Native Code Speed 2021 SIGMOD 0.0001021625
2,172 Spinning Fast Iterative Data Flows 2012 VLDB 9.3706587e-05
2,350 An Intermediate Representation for Optimizing Machine Learning Pipelines 2019 VLDB 8.9788641e-05
3,265 RHEEM: Enabling Cross-Platform Data Processing - May The Big Data Be With You! - 2018 VLDB 7.3083672e-05
4,909 A Method for Optimizing Opaque Filter Queries 2020 SIGMOD 5.8338804e-05
5,014 Dynamically Optimizing Queries over Large Scale Data Platforms 2014 SIGMOD 5.7586174e-05
5,427 The NebulaStream Platform: Data and Application Management for the Internet of Things 2020 CIDR 5.509468e-05
5,476 Containerized Execution of UDFs: An Experimental Evaluation 2022 VLDB 5.4866534e-05
6,075 Opportunistic Physical Design for Big Data Analytics 2014 SIGMOD 5.223901e-05
6,322 The BUDS Language for Distributed Bayesian Machine Learning 2017 SIGMOD 5.1124615e-05
7,306 DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines 2022 CIDR 4.7678574e-05
7,805 Scaling a Declarative Cluster Manager Architecture with Query Optimization Techniques 2023 VLDB 4.6462265e-05
8,444 Not Black-Box Anymore! Enabling Analytics-Aware Optimizations in Teradata Vantage 2021 VLDB 4.5118994e-05
9,187 POLAR: Adaptive and Non-invasive Join Order Selection via Plans of Least Resistance 2024 VLDB 4.3780059e-05
9,376 Versatile Optimization of UDF-heavy Data Flows with Sofa 2014 SIGMOD 4.347376e-05
9,519 PAXQuery: Parallel Analytical XML Processing 2015 SIGMOD 4.3323764e-05
9,846 HyperBlocker: Accelerating Rule-based Blocking in Entity Resolution using GPUs 2025 VLDB 4.2721228e-05
10,770 cedar: Optimized and Unified Machine Learning Input Data Pipelines 2025 VLDB 4.1945683e-05
12,030 How Achaeans Would Construct Columns in Troy 2013 CIDR 4.1945683e-05
12,039 Iterative Parallel Data Processing with Stratosphere: An Inside Look 2013 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 13 of 13 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers