Back to papers
Opening the Black Boxes in Data Flow Optimization
Summary: Optimizes data flows with black-box UDFs by extracting a few operator properties to determine safe reordering, via static analysis of UDF code. The optimizer enables selection reordering, bushy join ordering, and limited aggregation push-down for non-relational flows, achieving relational-like rewrite power without relying on operator semantics.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 10363
- Venue
- VLDB
- Year
- 2012
- Pagerank
- 8.4536967e-05
- Overall Rank
- 2,611 | 81.84%
- DOI
-
-
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 21 of 21 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 1,402 |
Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML |
2014 |
VLDB |
0.00012180605 |
| 1,882 |
Tuplex: Data Science in Python at Native Code Speed |
2021 |
SIGMOD |
0.0001021625 |
| 2,172 |
Spinning Fast Iterative Data Flows |
2012 |
VLDB |
9.3706587e-05 |
| 2,350 |
An Intermediate Representation for Optimizing Machine Learning Pipelines |
2019 |
VLDB |
8.9788641e-05 |
| 3,265 |
RHEEM: Enabling Cross-Platform Data Processing - May The Big Data Be With You! - |
2018 |
VLDB |
7.3083672e-05 |
| 4,909 |
A Method for Optimizing Opaque Filter Queries |
2020 |
SIGMOD |
5.8338804e-05 |
| 5,014 |
Dynamically Optimizing Queries over Large Scale Data Platforms |
2014 |
SIGMOD |
5.7586174e-05 |
| 5,427 |
The NebulaStream Platform: Data and Application Management for the Internet of Things |
2020 |
CIDR |
5.509468e-05 |
| 5,476 |
Containerized Execution of UDFs: An Experimental Evaluation |
2022 |
VLDB |
5.4866534e-05 |
| 6,075 |
Opportunistic Physical Design for Big Data Analytics |
2014 |
SIGMOD |
5.223901e-05 |
| 6,322 |
The BUDS Language for Distributed Bayesian Machine Learning |
2017 |
SIGMOD |
5.1124615e-05 |
| 7,306 |
DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines |
2022 |
CIDR |
4.7678574e-05 |
| 7,805 |
Scaling a Declarative Cluster Manager Architecture with Query Optimization Techniques |
2023 |
VLDB |
4.6462265e-05 |
| 8,444 |
Not Black-Box Anymore! Enabling Analytics-Aware Optimizations in Teradata Vantage |
2021 |
VLDB |
4.5118994e-05 |
| 9,187 |
POLAR: Adaptive and Non-invasive Join Order Selection via Plans of Least Resistance |
2024 |
VLDB |
4.3780059e-05 |
| 9,376 |
Versatile Optimization of UDF-heavy Data Flows with Sofa |
2014 |
SIGMOD |
4.347376e-05 |
| 9,519 |
PAXQuery: Parallel Analytical XML Processing |
2015 |
SIGMOD |
4.3323764e-05 |
| 9,846 |
HyperBlocker: Accelerating Rule-based Blocking in Entity Resolution using GPUs |
2025 |
VLDB |
4.2721228e-05 |
| 10,770 |
cedar: Optimized and Unified Machine Learning Input Data Pipelines |
2025 |
VLDB |
4.1945683e-05 |
| 12,030 |
How Achaeans Would Construct Columns in Troy |
2013 |
CIDR |
4.1945683e-05 |
| 12,039 |
Iterative Parallel Data Processing with Stratosphere: An Inside Look |
2013 |
SIGMOD |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 13 of 13 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 1 |
Access Path Selection in a Relational Database Management System |
1979 |
SIGMOD |
0.0040449103 |
| 3 |
Pig Latin: A Not-So-Foreign Language for Data Processing |
2008 |
SIGMOD |
0.0024183614 |
| 22 |
SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets |
2008 |
VLDB |
0.0008456613 |
| 51 |
Including Group-By in Query Optimization |
1994 |
VLDB |
0.00067123727 |
| 70 |
Hive - A Warehousing Solution Over a Map-Reduce Framework |
2009 |
VLDB |
0.00059533166 |
| 868 |
Profiling, What-if Analysis, and Cost-based Optimization of MapReduce Programs |
2011 |
VLDB |
0.00015789681 |
| 913 |
Tenzing A SQL Implementation On The MapReduce Framework |
2011 |
VLDB |
0.00015408131 |
| 1,265 |
Jaql: A Scripting Language for Large Scale Semistructured Data Analysis |
2011 |
VLDB |
0.00012947629 |
| 1,280 |
Automatic Optimization for MapReduce Programs |
2011 |
VLDB |
0.0001285503 |
| 1,341 |
Dynamic Programming Strikes Back |
2008 |
SIGMOD |
0.00012486285 |
| 1,355 |
SQL/MapReduce: A practical approach to self-describing, polymorphic, and parallelizable user-defined functions |
2009 |
VLDB |
0.00012404572 |
| 2,508 |
One Size Fits All? – Part 2: Benchmarking Results |
2007 |
CIDR |
8.6308132e-05 |
| 4,209 |
FERRY - Database-Supported Program Execution |
2009 |
SIGMOD |
6.3572003e-05 |
Semantically Similar Papers