The Power of Nested Parallelism in Big Data Processing – Hitting Three Flies with One Slap –
Summary: Matryoshka enables nested parallelism in dataflow engines with a two-phase flattening that turns programs into flat ones, even with inner control flow. It adds nesting primitives and runtime data-aware optimizations, validated on PageRank and K-means. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
Incoming Citations (Sorted by Pagerank)
Showing 4 of 4 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,731 | Babelfish: Efficient Execution of Polyglot Queries | 2022 | VLDB | 5.3502065e-05 |
| 7,306 | DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines | 2022 | CIDR | 4.7678574e-05 |
| 8,092 | Saga: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications | 2023 | SIGMOD | 4.587921e-05 |
| 8,514 | UPLIFT: Parallelization Strategies for Feature Transformations in Machine Learning Workloads | 2022 | VLDB | 4.4944285e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 12 of 12 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 7,882 | Massively Parallel Data Analysis with PACTs on Nephele | 2010 | VLDB | 4.6285796e-05 |
| 1,110 | Parallel Evaluation of Conjunctive Queries | 2011 | PODS | 0.00013968198 |
| 10,494 | Nested Parquet Is Flat, Why Not Use It? How To Scan Nested Data With On-the-Fly Key Generation and Joins | 2025 | SIGMOD | 4.1945683e-05 |
| 8,534 | Translation of Array-Based Loops to Distributed Data-Parallel Programs | 2020 | VLDB | 4.4937074e-05 |
| 8,078 | Meta-Dataflows: Efficient Exploratory Dataflow Jobs | 2018 | SIGMOD | 4.5914967e-05 |
| 12,039 | Iterative Parallel Data Processing with Stratosphere: An Inside Look | 2013 | SIGMOD | 4.1945683e-05 |
| 2,818 | Implicit Parallelism through Deep Language Embedding | 2015 | SIGMOD | 8.0665558e-05 |
| 6,658 | Scalable Querying of Nested Data | 2021 | VLDB | 4.9711629e-05 |
| 2,848 | Exploiting Matrix Dependency for Efficient Distributed Matrix Computation | 2015 | SIGMOD | 8.0208832e-05 |
| 2,172 | Spinning Fast Iterative Data Flows | 2012 | VLDB | 9.3706587e-05 |