Database Paper Browser

Back to papers

Profiling, What-if Analysis, and Cost-based Optimization of MapReduce Programs

Summary: First cost-based optimizer for MapReduce programs, tackling the large configuration parameter space with black-box map/reduce functions. Profiler for unmodified programs and a what-if cost estimator enable data-driven optimization; prototype on Hadoop, with cross-domain evaluation. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
10174
Venue
VLDB
Year
2011
Pagerank
0.00015789681
Overall Rank
868 | 93.97%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 37 of 37 citing papers.

Rank Citing Paper Year Venue Pagerank
979 Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads 2012 VLDB 0.0001488055
1,019 Robust Estimation of Resource Consumption for SQL Queries using Statistical Techniques 2012 VLDB 0.00014625603
1,402 Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML 2014 VLDB 0.00012180605
1,977 Split Query Processing in Polybase 2013 SIGMOD 9.8824589e-05
2,529 Pregelix: Big(ger) Graph Analytics on A Dataflow Engine 2015 VLDB 8.5940768e-05
2,611 Opening the Black Boxes in Data Flow Optimization 2012 VLDB 8.4536967e-05
2,667 Cumulon: Optimizing Statistical Data Analysis in the Cloud 2013 SIGMOD 8.3413995e-05
2,674 Minimal MapReduce Algorithms 2013 SIGMOD 8.3328645e-05
2,747 Stubby: A Transformation-based Optimizer for MapReduce Workflows 2012 VLDB 8.1828918e-05
2,928 WANalytics: Analytics for a Geo-Distributed Data-Intensive World 2015 CIDR 7.8812874e-05
3,429 Real-time Workload Pattern Analysis for Large-scale Cloud Databases 2023 VLDB 7.1010535e-05
3,703 Multi-Query Optimization in MapReduce Framework 2014 VLDB 6.8289978e-05
4,437 Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics 2015 VLDB 6.1907793e-05
4,802 Resource Elasticity for Large-Scale Machine Learning 2015 SIGMOD 5.9114415e-05
5,105 Only Aggressive Elephants are Fast Elephants 2012 VLDB 5.694494e-05
5,607 HYPER: Hypothetical Reasoning With What-If and How-To Queries Using a Probabilistic Causal Approach 2022 SIGMOD 5.4137872e-05
5,688 PREDIcT: Towards Predicting the Runtime of Large Scale Iterative Analytics 2013 VLDB 5.3702808e-05
6,268 Speedup Your Analytics: Automatic Parameter Tuning for Databases and Big Data Systems 2019 VLDB 5.133857e-05
6,498 Memory-Aware Framework for Efficient Second-Order Random Walk on Large Graphs 2020 SIGMOD 5.0392468e-05
6,565 Toward Interpretable and Actionable Data Analysis with Explanations and Causality 2022 VLDB 5.0081626e-05
6,757 KEA: Tuning an Exabyte-Scale Data Infrastructure 2021 SIGMOD 4.9372134e-05
6,821 Hadoop's Adolescence: An analysis of Hadoop usage in scientific workloads 2013 VLDB 4.9156923e-05
7,097 Fast Multi-Column Sorting in Main-Memory Column-Stores 2016 SIGMOD 4.8336115e-05
7,153 Submodularity of Distributed Join Computation 2018 SIGMOD 4.8153963e-05
7,304 MRTuner: A Toolkit to Enable Holistic Optimization for MapReduce Jobs 2014 VLDB 4.7684491e-05
8,358 MapReduce Programming and Cost-based Optimization? Crossing this Chasm with Starfish 2011 VLDB 4.5372998e-05
8,758 Hyperspace: The Indexing Subsystem of Azure Synapse 2021 VLDB 4.456315e-05
8,924 QMapper for Smart Grid: Migrating SQL-based Application to Hive 2015 SIGMOD 4.427232e-05
9,375 Efficient Big Data Processing in Hadoop MapReduce 2012 VLDB 4.347384e-05
9,504 Supporting Scalable Analytics with Latency Constraints 2015 VLDB 4.3341665e-05
9,736 UDAO: A Next-Generation Unified Data Analytics Optimizer 2019 VLDB 4.2942813e-05
11,368 PACk: An Efficient Partition-based Distributed Agglomerative Hierarchical Clustering Algorithm for Deduplication 2022 VLDB 4.1945683e-05
11,635 Automated Performance Management for the Big Data Stack 2019 CIDR 4.1945683e-05
11,668 Cost-Effective, Workload-Adaptive Migration of Big Data Applications to the Cloud 2019 SIGMOD 4.1945683e-05
11,694 An Experimental Evaluation of Garbage Collectors on Big Data Applications 2019 VLDB 4.1945683e-05
12,059 Workload Management for Big Data Analytics 2013 SIGMOD 4.1945683e-05
12,101 Optimization Strategies for A/B Testing on HADOOP 2013 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 8 of 8 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers