Back to papers
SkewTune: Mitigating Skew in MapReduce Applications
Summary: SkewTune automatically mitigates MapReduce skew without extra user input, as a drop-in Hadoop extension. It uses idle-node detection to repartition a straggler's unprocessed data, preserves input order for concatenation-based output reconstruction, and incurs minimal overhead when skew is absent.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 4509
- Venue
- SIGMOD
- Year
- 2012
- Pagerank
- 0.0001250413
- Overall Rank
- 1,334 | 90.73%
- DOI
-
-
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 21 of 21 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 542 |
Shark: SQL and Rich Analytics at Scale |
2013 |
SIGMOD |
0.00020595648 |
| 1,308 |
Upper and Lower Bounds on the Cost of a Map-Reduce Computation |
2013 |
VLDB |
0.00012661651 |
| 2,674 |
Minimal MapReduce Algorithms |
2013 |
SIGMOD |
8.3328645e-05 |
| 4,622 |
A General and Parallel Platform for Mining Co-Movement Patterns over Large-scale Trajectories |
2017 |
VLDB |
6.0416152e-05 |
| 4,650 |
LocationSpark: A Distributed In-Memory Data Management System for Big Spatial Data |
2016 |
VLDB |
6.0234336e-05 |
| 5,532 |
A Padded Encoding Scheme to Accelerate Scans by Leveraging Skew |
2015 |
SIGMOD |
5.4548897e-05 |
| 6,131 |
Fast Data in the Era of Big Data: Twitter's Real-Time Related Query Suggestion Architecture |
2013 |
SIGMOD |
5.1956688e-05 |
| 6,136 |
Scalable Progressive Analytics on Big Data in the Cloud |
2013 |
VLDB |
5.1928748e-05 |
| 6,821 |
Hadoop's Adolescence: An analysis of Hadoop usage in scientific workloads |
2013 |
VLDB |
4.9156923e-05 |
| 7,060 |
SquirrelJoin: Network-Aware Distributed Join Processing with Lazy Partitioning |
2017 |
VLDB |
4.8465382e-05 |
| 7,153 |
Submodularity of Distributed Join Computation |
2018 |
SIGMOD |
4.8153963e-05 |
| 7,304 |
MRTuner: A Toolkit to Enable Holistic Optimization for MapReduce Jobs |
2014 |
VLDB |
4.7684491e-05 |
| 8,401 |
Toward Progress Indicators on Steroids for Big Data Systems |
2013 |
CIDR |
4.5250912e-05 |
| 8,978 |
SpongeFiles: Mitigating Data Skew in MapReduce Using Distributed Memory |
2014 |
SIGMOD |
4.417225e-05 |
| 9,001 |
The Power of Nested Parallelism in Big Data Processing – Hitting Three Flies with One Slap – |
2021 |
SIGMOD |
4.4107627e-05 |
| 9,797 |
Dalton: Learned Partitioning for Distributed Data Streams |
2023 |
VLDB |
4.2818172e-05 |
| 11,531 |
Fangorn: Adaptive Execution Framework for Heterogeneous Workloads on Shared Clusters |
2021 |
VLDB |
4.1945683e-05 |
| 11,694 |
An Experimental Evaluation of Garbage Collectors on Big Data Applications |
2019 |
VLDB |
4.1945683e-05 |
| 11,933 |
FP-Hadoop: Efficient Execution of Parallel Jobs Over Skewed Data |
2015 |
VLDB |
4.1945683e-05 |
| 11,949 |
Big Data Research: Will Industry Solve all the Problems? |
2015 |
VLDB |
4.1945683e-05 |
| 12,140 |
SkewTune in Action: Mitigating Skew in MapReduce Applications |
2012 |
VLDB |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 16 of 16 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 14 |
Online Aggregation |
1997 |
SIGMOD |
0.0010801504 |
| 413 |
HaLoop: Efficient Iterative Data Processing on Large Clusters |
2010 |
VLDB |
0.00023904409 |
| 588 |
Practical Skew Handling in Parallel Joins |
1992 |
VLDB |
0.00019604754 |
| 780 |
Building a High-Level Dataflow System on top of Map-Reduce: The Pig Experience |
2009 |
VLDB |
0.00016775082 |
| 794 |
Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing) |
2010 |
VLDB |
0.00016605103 |
| 861 |
A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins |
1991 |
VLDB |
0.00015848554 |
| 1,071 |
Starfish: A Self-tuning System for Big Data Analytics |
2011 |
CIDR |
0.00014312777 |
| 1,110 |
Parallel Evaluation of Conjunctive Queries |
2011 |
PODS |
0.00013968198 |
| 1,357 |
Highly Available, Fault-Tolerant, Parallel Dataflows |
2004 |
SIGMOD |
0.00012392275 |
| 1,674 |
Adaptive Parallel Aggregation Algorithms |
1995 |
SIGMOD |
0.0001094787 |
| 2,208 |
Clustera: An Integrated Computation And Data Management System |
2008 |
VLDB |
9.2873257e-05 |
| 2,476 |
A Platform for Scalable One-Pass Analytics using MapReduce |
2011 |
SIGMOD |
8.6960139e-05 |
| 2,575 |
A Latency and Fault-Tolerance Optimizer for Online Parallel Query Plans |
2011 |
SIGMOD |
8.5133576e-05 |
| 3,893 |
Estimation of Query-Result Distribution and its Application in Parallel-Join Load Balancing |
1996 |
VLDB |
6.6584217e-05 |
| 5,568 |
Efficient outer join data skew handling in parallel DBMS |
2009 |
VLDB |
5.4301489e-05 |
| 12,140 |
SkewTune in Action: Mitigating Skew in MapReduce Applications |
2012 |
VLDB |
4.1945683e-05 |
Semantically Similar Papers