Hadoop's Adolescence: An analysis of Hadoop usage in scientific workloads
Summary: First user-centered measurement of Hadoop use in scientific workloads across three clusters. Underuse of Hadoop features and tools coexists with diverse resource usage and interactive/iterative workloads, signaling strong need for automatic tuning and ecosystem optimizations. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Kai Ren
- 2. YongChul Kwon
- 3. Magdalena Balazinska
- 4. Bill Howe
Incoming Citations (Sorted by Pagerank)
Showing 5 of 5 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,873 | An Architecture for Compiling UDF-centric Workflows | 2015 | VLDB | 0.00010253002 |
| 2,965 | SQLShare: Results from a Multi-Year SQL-as-a-Service Experiment | 2016 | SIGMOD | 7.8059273e-05 |
| 3,809 | Changing the Face of Database Cloud Services with Personalized Service Level Agreements | 2015 | CIDR | 6.7409982e-05 |
| 6,075 | Opportunistic Physical Design for Big Data Analytics | 2014 | SIGMOD | 5.223901e-05 |
| 7,689 | ROBUS: Fair Cache Allocation for Data-parallel Workloads | 2017 | SIGMOD | 4.6765769e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 8 of 8 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3 | Pig Latin: A Not-So-Foreign Language for Data Processing | 2008 | SIGMOD | 0.0024183614 |
| 4 | Pregel: A System for Large-Scale Graph Processing | 2010 | SIGMOD | 0.0019005923 |
| 42 | A Comparison of Approaches to Large-Scale Data Analysis | 2009 | SIGMOD | 0.00073498298 |
| 868 | Profiling, What-if Analysis, and Cost-based Optimization of MapReduce Programs | 2011 | VLDB | 0.00015789681 |
| 979 | Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads | 2012 | VLDB | 0.0001488055 |
| 1,334 | SkewTune: Mitigating Skew in MapReduce Applications | 2012 | SIGMOD | 0.0001250413 |
| 1,534 | PerfXplain: Debugging MapReduce Job Performance | 2012 | VLDB | 0.00011468393 |
| 2,035 | Generating Example Data for Dataflow Programs | 2009 | SIGMOD | 9.7149269e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 2,439 | CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop | 2011 | VLDB | 8.8190594e-05 |
| 6,104 | Automating Distributed Tiered Storage Management in Cluster Computing | 2020 | VLDB | 5.2080102e-05 |
| 9,375 | Efficient Big Data Processing in Hadoop MapReduce | 2012 | VLDB | 4.347384e-05 |
| 2,476 | A Platform for Scalable One-Pass Analytics using MapReduce | 2011 | SIGMOD | 8.6960139e-05 |
| 7,511 | Hone: "Scaling Down" Hadoop on Shared-Memory Systems | 2013 | VLDB | 4.7180617e-05 |
| 4,857 | The "Big Data" Ecosystem at LinkedIn | 2013 | SIGMOD | 5.8736144e-05 |
| 2,337 | Efficient Processing of Data Warehousing Queries in a Split Execution Environment | 2011 | SIGMOD | 9.0098186e-05 |
| 794 | Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing) | 2010 | VLDB | 0.00016605103 |
| 1,615 | The Performance of MapReduce: An In-depth Study | 2010 | VLDB | 0.00011132319 |
| 12,101 | Optimization Strategies for A/B Testing on HADOOP | 2013 | VLDB | 4.1945683e-05 |