The Performance of MapReduce: An In-depth Study

Summary: In-depth performance study of Hadoop MapReduce on a 100-node EC2 cluster; identifies five design factors shaping throughput. Tuning these factors yields 2.5–3.5x gains, narrowing the gap with parallel DBs and enabling economical elastic cloud processing. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID: 10021
Venue: VLDB
Year: 2010
Pagerank: 0.00011137225
Overall Rank: 1,615 | 88.78%
DOI: -

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 21 of 21 citing papers.

Rank	Citing Paper	Year	Venue	Pagerank
866	Profiling, What-if Analysis, and Cost-based Optimization of MapReduce Programs	2011	VLDB	0.00015771189
1,048	Starfish: A Self-tuning System for Big Data Analytics	2011	CIDR	0.00014442178
1,533	PerfXplain: Debugging MapReduce Job Performance	2012	VLDB	0.00011462982
1,797	epiC: an Extensible and Scalable System for Processing Big Data	2014	VLDB	0.00010502488
1,927	Efficient Processing of k Nearest Neighbor Joins using MapReduce	2012	VLDB	0.00010062395
2,441	CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop	2011	VLDB	8.8106295e-05
2,476	A Platform for Scalable One-Pass Analytics using MapReduce	2011	SIGMOD	8.6907971e-05
3,068	Efficient Multi-way Theta-Join Processing Using MapReduce	2012	VLDB	7.6241861e-05
3,070	HAWQ: A Massively Parallel Processing SQL Engine in Hadoop	2014	SIGMOD	7.6157391e-05
3,120	Llama: Leveraging Columnar Storage for Scalable Join Processing in the MapReduce Framework	2011	SIGMOD	7.533894e-05
3,193	Column-Oriented Storage Techniques for MapReduce	2011	VLDB	7.4073719e-05
3,714	Optimizing Analytic Data Flows for Multiple Execution Engines	2012	SIGMOD	6.8176849e-05
5,110	Only Aggressive Elephants are Fast Elephants	2012	VLDB	5.6868273e-05
5,908	Building Wavelet Histograms on Large Data in MapReduce	2012	VLDB	5.2731311e-05
6,177	Exploiting Soft and Hard Correlations in Big Data Query Optimization	2016	VLDB	5.1646971e-05
6,265	Speedup Your Analytics: Automatic Parameter Tuning for Databases and Big Data Systems	2019	VLDB	5.1294788e-05
8,086	ScalaGiST: Scalable Generalized Search Trees for MapReduce Systems [Innovative Systems Paper]	2014	VLDB	4.5858883e-05
8,460	Piranha: Optimizing Short Jobs in Hadoop	2013	VLDB	4.5008938e-05
9,360	Efficient Big Data Processing in Hadoop MapReduce	2012	VLDB	4.3476444e-05
11,699	An Experimental Evaluation of Garbage Collectors on Big Data Applications	2019	VLDB	4.1905499e-05
11,995	DGFIndex for Smart Grid: Enhancing Hive with a Cost-Effective Multidimensional Range Index	2014	VLDB	4.1905499e-05

Outgoing Citations (Sorted by Pagerank)

Showing 9 of 9 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank	Cited Paper	Year	Venue	Pagerank
3	Pig Latin: A Not-So-Foreign Language for Data Processing	2008	SIGMOD	0.0024217964
15	Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters	2007	SIGMOD	0.0010668335
22	SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets	2008	VLDB	0.00084679526
42	A Comparison of Approaches to Large-Scale Data Analysis	2009	SIGMOD	0.00073570328
70	Hive - A Warehousing Solution Over a Map-Reduce Framework	2009	VLDB	0.00059744625
158	HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads	2009	VLDB	0.00040401371
708	Performance Tradeoffs in Read-Optimized Databases	2006	VLDB	0.00017753572
2,213	Clustera: An Integrated Computation And Data Management System	2008	VLDB	9.2796992e-05
3,761	Read-Optimized Databases, In Depth	2008	VLDB	6.7777865e-05

Semantically Similar Papers

Overall Rank	Paper	Year	Venue	Pagerank
866	Profiling, What-if Analysis, and Cost-based Optimization of MapReduce Programs	2011	VLDB	0.00015771189
3,709	Multi-Query Optimization in MapReduce Framework	2014	VLDB	6.8211506e-05
2,340	Efficient Processing of Data Warehousing Queries in a Split Execution Environment	2011	SIGMOD	9.001663e-05
9,360	Efficient Big Data Processing in Hadoop MapReduce	2012	VLDB	4.3476444e-05
3,193	Column-Oriented Storage Techniques for MapReduce	2011	VLDB	7.4073719e-05
2,714	Minimal MapReduce Algorithms	2013	SIGMOD	8.2426646e-05
158	HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads	2009	VLDB	0.00040401371
42	A Comparison of Approaches to Large-Scale Data Analysis	2009	SIGMOD	0.00073570328
2,476	A Platform for Scalable One-Pass Analytics using MapReduce	2011	SIGMOD	8.6907971e-05
789	Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing)	2010	VLDB	0.00016602215