The Performance of MapReduce: An In-depth Study
Summary: In-depth performance study of Hadoop MapReduce on a 100-node EC2 cluster; identifies five design factors shaping throughput. Tuning these factors yields 2.5–3.5x gains, narrowing the gap with parallel DBs and enabling economical elastic cloud processing. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Dawei Jiang
- 2. Beng Chin Ooi
- 3. Lei Shi
- 4. Sai Wu
Incoming Citations (Sorted by Pagerank)
Showing 21 of 21 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 9 of 9 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3 | Pig Latin: A Not-So-Foreign Language for Data Processing | 2008 | SIGMOD | 0.0024183614 |
| 15 | Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters | 2007 | SIGMOD | 0.0010654262 |
| 22 | SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets | 2008 | VLDB | 0.0008456613 |
| 42 | A Comparison of Approaches to Large-Scale Data Analysis | 2009 | SIGMOD | 0.00073498298 |
| 70 | Hive - A Warehousing Solution Over a Map-Reduce Framework | 2009 | VLDB | 0.00059533166 |
| 157 | HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads | 2009 | VLDB | 0.00040397359 |
| 710 | Performance Tradeoffs in Read-Optimized Databases | 2006 | VLDB | 0.00017765454 |
| 2,208 | Clustera: An Integrated Computation And Data Management System | 2008 | VLDB | 9.2873257e-05 |
| 3,764 | Read-Optimized Databases, In Depth | 2008 | VLDB | 6.7797554e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 868 | Profiling, What-if Analysis, and Cost-based Optimization of MapReduce Programs | 2011 | VLDB | 0.00015789681 |
| 3,703 | Multi-Query Optimization in MapReduce Framework | 2014 | VLDB | 6.8289978e-05 |
| 2,337 | Efficient Processing of Data Warehousing Queries in a Split Execution Environment | 2011 | SIGMOD | 9.0098186e-05 |
| 9,375 | Efficient Big Data Processing in Hadoop MapReduce | 2012 | VLDB | 4.347384e-05 |
| 3,208 | Column-Oriented Storage Techniques for MapReduce | 2011 | VLDB | 7.3781897e-05 |
| 2,674 | Minimal MapReduce Algorithms | 2013 | SIGMOD | 8.3328645e-05 |
| 157 | HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads | 2009 | VLDB | 0.00040397359 |
| 42 | A Comparison of Approaches to Large-Scale Data Analysis | 2009 | SIGMOD | 0.00073498298 |
| 2,476 | A Platform for Scalable One-Pass Analytics using MapReduce | 2011 | SIGMOD | 8.6960139e-05 |
| 794 | Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing) | 2010 | VLDB | 0.00016605103 |