Database Paper Browser

Back to papers

Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing)

Summary: Introduces Hadoop++, a UDF-driven acceleration layer that speeds MapReduce tasks inside Hadoop without modifying the framework. UDF injections at key points boost indexing and joins, beating Hadoop and HadoopDB while staying compatible with future changes. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
10101
Venue
VLDB
Year
2010
Pagerank
0.00016605103
Overall Rank
794 | 94.48%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 39 of 39 citing papers.

Rank Citing Paper Year Venue Pagerank
868 Profiling, What-if Analysis, and Cost-based Optimization of MapReduce Programs 2011 VLDB 0.00015789681
979 Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads 2012 VLDB 0.0001488055
1,071 Starfish: A Self-tuning System for Big Data Analytics 2011 CIDR 0.00014312777
1,280 Automatic Optimization for MapReduce Programs 2011 VLDB 0.0001285503
1,334 SkewTune: Mitigating Skew in MapReduce Applications 2012 SIGMOD 0.0001250413
1,534 PerfXplain: Debugging MapReduce Job Performance 2012 VLDB 0.00011468393
2,337 Efficient Processing of Data Warehousing Queries in a Split Execution Environment 2011 SIGMOD 9.0098186e-05
2,439 CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop 2011 VLDB 8.8190594e-05
2,674 Minimal MapReduce Algorithms 2013 SIGMOD 8.3328645e-05
2,803 TriAD: A Distributed Shared-Nothing RDF Engine based on Asynchronous Message Passing 2014 SIGMOD 8.0940362e-05
3,062 Efficient Multi-way Theta-Join Processing Using MapReduce 2012 VLDB 7.6343994e-05
3,115 Llama: Leveraging Columnar Storage for Scalable Join Processing in the MapReduce Framework 2011 SIGMOD 7.543505e-05
3,129 Scalable Big Graph Processing in MapReduce 2014 SIGMOD 7.5008242e-05
3,208 Column-Oriented Storage Techniques for MapReduce 2011 VLDB 7.3781897e-05
3,571 Lightning Fast and Space Efficient Inequality Joins 2015 VLDB 6.9580858e-05
3,710 Optimizing Analytic Data Flows for Multiple Execution Engines 2012 SIGMOD 6.8238962e-05
4,248 Hyper Dimension Shuffle: Efficient Data Repartition at Petabyte Scale in SCOPE 2019 VLDB 6.3247927e-05
4,572 The Unified Logging Infrastructure for Data Analytics at Twitter 2012 VLDB 6.0760183e-05
4,573 Clydesdale: Structured Data Processing on Hadoop 2012 SIGMOD 6.0753788e-05
5,105 Only Aggressive Elephants are Fast Elephants 2012 VLDB 5.694494e-05
5,118 AdaptDB: Adaptive Partitioning for Distributed Joins 2017 VLDB 5.6820984e-05
5,376 Holistic Indexing in Main-memory Column-stores 2015 SIGMOD 5.5417421e-05
5,558 A Hadoop Based Distributed Loading Approach to Parallel Data Warehouses 2011 SIGMOD 5.4341353e-05
5,903 Building Wavelet Histograms on Large Data in MapReduce 2012 VLDB 5.2791351e-05
6,173 Exploiting Soft and Hard Correlations in Big Data Query Optimization 2016 VLDB 5.1699414e-05
7,476 Lachesis: Automatic Partitioning for UDF-Centric Analytics 2021 VLDB 4.7188928e-05
7,907 Petabyte-Scale Row-Level Operations in Data Lakehouses 2024 VLDB 4.6205839e-05
7,918 Indexing HDFS Data in PDW: Splitting the data from the index 2014 VLDB 4.6170838e-05
7,958 CARTILAGE: Adding Flexibility to the Hadoop Skeleton 2013 SIGMOD 4.613363e-05
8,002 Pangea: Monolithic Distributed Storage for Data Analytics 2019 VLDB 4.6088289e-05
8,084 ScalaGiST: Scalable Generalized Search Trees for MapReduce Systems [Innovative Systems Paper] 2014 VLDB 4.5902866e-05
8,464 Piranha: Optimizing Short Jobs in Hadoop 2013 VLDB 4.5052127e-05
9,347 Rank Join Queries in NoSQL Databases 2014 VLDB 4.3526718e-05
9,375 Efficient Big Data Processing in Hadoop MapReduce 2012 VLDB 4.347384e-05
11,987 DGFIndex for Smart Grid: Enhancing Hive with a Cost-Effective Multidimensional Range Index 2014 VLDB 4.1945683e-05
12,028 D-Hive: Data Bees Pollinating RDF, Text, and Time 2013 CIDR 4.1945683e-05
12,030 How Achaeans Would Construct Columns in Troy 2013 CIDR 4.1945683e-05
12,071 Mosquito: Another One Bites the Data Upload STream 2013 VLDB 4.1945683e-05
13,487 RAFT at Work: Speeding-Up MapReduce Applications under Task and Node Failures 2011 SIGMOD -
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 10 of 10 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers