Versatile Optimization of UDF-heavy Data Flows with Sofa
Summary: Introduces Meteor, a declarative data-flow language, and Sofa, a logical optimizer for UDF-heavy analytics in Stratosphere. Uses a compact set of UDF annotations and rewrite rules to realize semantically equivalent plan rewrites, with a subsumption-based operator hierarchy for extensibility and interactive, cost-based optimization with dependency visualization. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Astrid Rheinländer
- 2. Martin Beckmann
- 3. Anja Kunkel
- 4. Arvid Heise
- 5. Thomas Stoltmann
- 6. Ulf Leser
Incoming Citations (Sorted by Pagerank)
Showing 1 of 1 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 9,762 | QURE: AI-Assisted and Automatically Verified UDF Inlining | 2025 | SIGMOD | 4.2856106e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 6 of 6 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3 | Pig Latin: A Not-So-Foreign Language for Data Processing | 2008 | SIGMOD | 0.0024183614 |
| 70 | Hive - A Warehousing Solution Over a Map-Reduce Framework | 2009 | VLDB | 0.00059533166 |
| 287 | Declarative Information Extraction Using Datalog with Embedded Extraction Predicates | 2007 | VLDB | 0.00028971272 |
| 1,265 | Jaql: A Scripting Language for Large Scale Semistructured Data Analysis | 2011 | VLDB | 0.00012947629 |
| 2,611 | Opening the Black Boxes in Data Flow Optimization | 2012 | VLDB | 8.4536967e-05 |
| 7,882 | Massively Parallel Data Analysis with PACTs on Nephele | 2010 | VLDB | 4.6285796e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 9,973 | End-to-End Declarative Data Analytics: Co-designing Engines, Interfaces, and Cloud Infrastructure | 2026 | CIDR | 4.1945683e-05 |
| 1,873 | An Architecture for Compiling UDF-centric Workflows | 2015 | VLDB | 0.00010253002 |
| 10,459 | UDFBench: A Tool for Benchmarking UDF Queries on SQL Engines | 2025 | SIGMOD | 4.1945683e-05 |
| 5,014 | Dynamically Optimizing Queries over Large Scale Data Platforms | 2014 | SIGMOD | 5.7586174e-05 |
| 12,316 | Fast and Dynamic OLAP Exploration Using UDFs | 2009 | SIGMOD | 4.1945683e-05 |
| 10,284 | FlowLog: Efficient and Extensible Datalog via Incrementality | 2026 | VLDB | 4.1945683e-05 |
| 6,863 | Declarative Sub-Operators for Universal Data Processing | 2023 | VLDB | 4.905092e-05 |
| 9,763 | The UDFBench Benchmark for General-purpose UDF Queries | 2025 | VLDB | 4.2856106e-05 |
| 8,583 | Efficient Execution of User-Defined Functions in SQL Queries | 2023 | VLDB | 4.4919445e-05 |
| 2,611 | Opening the Black Boxes in Data Flow Optimization | 2012 | VLDB | 8.4536967e-05 |