Hyper Dimension Shuffle: Efficient Data Repartition at Petabyte Scale in SCOPE
Summary: Hyper Dimension Shuffle introduces a recursive, divide-and-conquer shuffle for petabyte-scale data in SCOPE. Recursive partitioning with intermediate aggregations yields quasilinear shuffling complexity and tight fan-out/fan-in guarantees, avoiding prior quadratic blowups. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Shi Qiao
- 2. Adrian Nicoara
- 3. Jin Sun
- 4. Marc Friedman
- 5. Hiren Patel
- 6. Jaliya Ekanayake
Incoming Citations (Sorted by Pagerank)
Showing 8 of 8 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 2,062 | Dremel: A Decade of Interactive SQL Analysis at Web Scale | 2020 | VLDB | 9.6481955e-05 |
| 5,888 | Magnet: Push-based Shuffle Service for Large-scale Data Processing | 2020 | VLDB | 5.2873617e-05 |
| 6,261 | The Cosmos Big Data Platform at Microsoft: Over a Decade of Progress and a Decade to Look Forward | 2021 | VLDB | 5.1350714e-05 |
| 6,673 | Incorporating Super-Operators in Big-Data Query Optimizers | 2020 | VLDB | 4.966799e-05 |
| 6,757 | KEA: Tuning an Exabyte-Scale Data Infrastructure | 2021 | SIGMOD | 4.9372134e-05 |
| 7,778 | Runtime Variation in Big Data Analytics | 2023 | SIGMOD | 4.653651e-05 |
| 8,506 | New Query Optimization Techniques in the Spark Engine of Azure Synapse | 2022 | VLDB | 4.4957661e-05 |
| 8,758 | Hyperspace: The Indexing Subsystem of Azure Synapse | 2021 | VLDB | 4.456315e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 9 of 9 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 22 | SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets | 2008 | VLDB | 0.0008456613 |
| 157 | HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads | 2009 | VLDB | 0.00040397359 |
| 794 | Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing) | 2010 | VLDB | 0.00016605103 |
| 1,152 | Blink and It's Done: Interactive Queries on Very Large Data | 2012 | VLDB | 0.00013645792 |
| 1,543 | NUMA-aware algorithms: the case of data shuffling | 2013 | CIDR | 0.0001145318 |
| 2,488 | Shark: Fast Data Analysis Using Coarse-grained Distributed Memory | 2012 | SIGMOD | 8.6683713e-05 |
| 3,535 | Scaling Spark in the Real World: Performance and Usability | 2015 | VLDB | 6.9992495e-05 |
| 4,174 | Computation Reuse in Analytics Job Service at Microsoft | 2018 | SIGMOD | 6.3856219e-05 |
| 7,599 | Quill: Efficient, Transferable, and Rich Analytics at Scale | 2016 | VLDB | 4.7003593e-05 |
Previous
Page 1 / 1
Next