Database Paper Browser

Back to papers

SkewTune: Mitigating Skew in MapReduce Applications

Summary: SkewTune automatically mitigates MapReduce skew without extra user input, as a drop-in Hadoop extension. It uses idle-node detection to repartition a straggler's unprocessed data, preserves input order for concatenation-based output reconstruction, and incurs minimal overhead when skew is absent. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
4509
Venue
SIGMOD
Year
2012
Pagerank
0.0001250413
Overall Rank
1,334 | 90.73%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 21 of 21 citing papers.

Rank Citing Paper Year Venue Pagerank
542 Shark: SQL and Rich Analytics at Scale 2013 SIGMOD 0.00020595648
1,308 Upper and Lower Bounds on the Cost of a Map-Reduce Computation 2013 VLDB 0.00012661651
2,674 Minimal MapReduce Algorithms 2013 SIGMOD 8.3328645e-05
4,622 A General and Parallel Platform for Mining Co-Movement Patterns over Large-scale Trajectories 2017 VLDB 6.0416152e-05
4,650 LocationSpark: A Distributed In-Memory Data Management System for Big Spatial Data 2016 VLDB 6.0234336e-05
5,532 A Padded Encoding Scheme to Accelerate Scans by Leveraging Skew 2015 SIGMOD 5.4548897e-05
6,131 Fast Data in the Era of Big Data: Twitter's Real-Time Related Query Suggestion Architecture 2013 SIGMOD 5.1956688e-05
6,136 Scalable Progressive Analytics on Big Data in the Cloud 2013 VLDB 5.1928748e-05
6,821 Hadoop's Adolescence: An analysis of Hadoop usage in scientific workloads 2013 VLDB 4.9156923e-05
7,060 SquirrelJoin: Network-Aware Distributed Join Processing with Lazy Partitioning 2017 VLDB 4.8465382e-05
7,153 Submodularity of Distributed Join Computation 2018 SIGMOD 4.8153963e-05
7,304 MRTuner: A Toolkit to Enable Holistic Optimization for MapReduce Jobs 2014 VLDB 4.7684491e-05
8,401 Toward Progress Indicators on Steroids for Big Data Systems 2013 CIDR 4.5250912e-05
8,978 SpongeFiles: Mitigating Data Skew in MapReduce Using Distributed Memory 2014 SIGMOD 4.417225e-05
9,001 The Power of Nested Parallelism in Big Data Processing – Hitting Three Flies with One Slap – 2021 SIGMOD 4.4107627e-05
9,797 Dalton: Learned Partitioning for Distributed Data Streams 2023 VLDB 4.2818172e-05
11,531 Fangorn: Adaptive Execution Framework for Heterogeneous Workloads on Shared Clusters 2021 VLDB 4.1945683e-05
11,694 An Experimental Evaluation of Garbage Collectors on Big Data Applications 2019 VLDB 4.1945683e-05
11,933 FP-Hadoop: Efficient Execution of Parallel Jobs Over Skewed Data 2015 VLDB 4.1945683e-05
11,949 Big Data Research: Will Industry Solve all the Problems? 2015 VLDB 4.1945683e-05
12,140 SkewTune in Action: Mitigating Skew in MapReduce Applications 2012 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 16 of 16 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers