SpongeFiles: Mitigating Data Skew in MapReduce Using Distributed Memory
Summary: Introduces SpongeFiles, a distributed-memory abstraction for MapReduce that stores spilled data as a logical byte array across memory and disk, reducing data skew. Nearest-capacity routing lets idle nodes tap neighbor memory/disk, yielding up to 55% overall speedup (85% under contention) on Hadoop/Pig. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
Incoming Citations (Sorted by Pagerank)
Showing 1 of 1 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 6,856 | Liquid: Unifying Nearline and Offline Big Data Integration | 2015 | CIDR | 4.9060615e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 9 of 9 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3 | Pig Latin: A Not-So-Foreign Language for Data Processing | 2008 | SIGMOD | 0.0024183614 |
| 53 | PNUTS: Yahoo!'s Hosted Data Serving Platform | 2008 | VLDB | 0.00066144767 |
| 588 | Practical Skew Handling in Parallel Joins | 1992 | VLDB | 0.00019604754 |
| 679 | Skew-Aware Automatic Database Partitioning in Shared-Nothing, Parallel OLTP Systems | 2012 | SIGMOD | 0.00018215154 |
| 780 | Building a High-Level Dataflow System on top of Map-Reduce: The Pig Experience | 2009 | VLDB | 0.00016775082 |
| 861 | A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins | 1991 | VLDB | 0.00015848554 |
| 1,334 | SkewTune: Mitigating Skew in MapReduce Applications | 2012 | SIGMOD | 0.0001250413 |
| 1,365 | Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning | 1991 | VLDB | 0.00012368421 |
| 8,464 | Piranha: Optimizing Short Jobs in Hadoop | 2013 | VLDB | 4.5052127e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,888 | Magnet: Push-based Shuffle Service for Large-scale Data Processing | 2020 | VLDB | 5.2873617e-05 |
| 4,061 | Advanced Partitioning Techniques for Massively Distributed Computation | 2012 | SIGMOD | 6.483587e-05 |
| 3,208 | Column-Oriented Storage Techniques for MapReduce | 2011 | VLDB | 7.3781897e-05 |
| 7,511 | Hone: "Scaling Down" Hadoop on Shared-Memory Systems | 2013 | VLDB | 4.7180617e-05 |
| 2,205 | ReStore: Reusing Results of MapReduce Jobs | 2012 | VLDB | 9.2920002e-05 |
| 12,140 | SkewTune in Action: Mitigating Skew in MapReduce Applications | 2012 | VLDB | 4.1945683e-05 |
| 2,476 | A Platform for Scalable One-Pass Analytics using MapReduce | 2011 | SIGMOD | 8.6960139e-05 |
| 11,933 | FP-Hadoop: Efficient Execution of Parallel Jobs Over Skewed Data | 2015 | VLDB | 4.1945683e-05 |
| 1,334 | SkewTune: Mitigating Skew in MapReduce Applications | 2012 | SIGMOD | 0.0001250413 |
| 11,835 | An Efficient MapReduce Cube Algorithm for Varied Data Distributions | 2016 | SIGMOD | 4.1945683e-05 |