Database Paper Browser

Back to papers

HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads

Summary: Hybrid architecture blending MapReduce scalability with DBMS-style optimization on shared-nothing hardware. Aims to match parallel DBMS performance while preserving MapReduce fault tolerance, scalability, and flexibility for cloud analytics. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
9957
Venue
VLDB
Year
2009
Pagerank
0.00040397359
Overall Rank
157 | 98.91%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 50 of 72 citing papers.

Rank Citing Paper Year Venue Pagerank
109 Dremel: Interactive Analysis of Web-Scale Datasets 2010 VLDB 0.00048186983
413 HaLoop: Efficient Iterative Data Processing on Large Clusters 2010 VLDB 0.00023904409
542 Shark: SQL and Rich Analytics at Scale 2013 SIGMOD 0.00020595648
582 Scalable SPARQL Querying of Large RDF Graphs 2011 VLDB 0.00019723083
794 Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing) 2010 VLDB 0.00016605103
868 Profiling, What-if Analysis, and Cost-based Optimization of MapReduce Programs 2011 VLDB 0.00015789681
913 Tenzing A SQL Implementation On The MapReduce Framework 2011 VLDB 0.00015408131
947 MRShare: Sharing Across Multiple Queries in MapReduce 2010 VLDB 0.00015114576
1,071 Starfish: A Self-tuning System for Big Data Analytics 2011 CIDR 0.00014312777
1,158 Simulation of Database-Valued Markov Chains Using SimSQL 2013 SIGMOD 0.0001361064
1,261 Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce 2013 VLDB 0.00012989236
1,280 Automatic Optimization for MapReduce Programs 2011 VLDB 0.0001285503
1,615 The Performance of MapReduce: An In-depth Study 2010 VLDB 0.00011132319
1,800 epiC: an Extensible and Scalable System for Processing Big Data 2014 VLDB 0.00010512649
1,814 Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing 2014 VLDB 0.00010458107
1,840 dbTouch: Analytics at your Fingertips 2013 CIDR 0.0001034905
1,863 Cheetah: A High Performance, Custom Data Warehouse on Top of MapReduce 2010 VLDB 0.00010286531
1,939 From Theory to Practice: Efficient Join Query Evaluation in a Parallel Database System 2015 SIGMOD 0.00010025655
2,127 SQL-on-Hadoop: Full Circle Back to Shared-Nothing Database Architectures 2014 VLDB 9.4863172e-05
2,322 Instant Loading for Main Memory Databases 2013 VLDB 9.034874e-05
2,337 Efficient Processing of Data Warehousing Queries in a Split Execution Environment 2011 SIGMOD 9.0098186e-05
2,413 Automated Partitioning Design in Parallel Database Systems 2011 SIGMOD 8.8672223e-05
2,439 CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop 2011 VLDB 8.8190594e-05
2,458 REX: Recursive, Delta-Based Data-Centric Computation 2012 VLDB 8.7683462e-05
2,674 Minimal MapReduce Algorithms 2013 SIGMOD 8.3328645e-05
3,034 How to Fit when No One Size Fits 2013 CIDR 7.6752083e-05
3,066 HAWQ: A Massively Parallel Processing SQL Engine in Hadoop 2014 SIGMOD 7.6221974e-05
3,115 Llama: Leveraging Columnar Storage for Scalable Join Processing in the MapReduce Framework 2011 SIGMOD 7.543505e-05
3,208 Column-Oriented Storage Techniques for MapReduce 2011 VLDB 7.3781897e-05
3,214 Query Optimization Techniques for Partitioned Tables 2011 SIGMOD 7.3661891e-05
3,517 Integrating Hadoop and Parallel DBMS 2010 SIGMOD 7.0199423e-05
3,562 MISO: Souping Up Big Data Query Processing with a Multistore System 2014 SIGMOD 6.9694564e-05
3,601 Large-Scale Machine Learning at Twitter 2012 SIGMOD 6.9315087e-05
3,710 Optimizing Analytic Data Flows for Multiple Execution Engines 2012 SIGMOD 6.8238962e-05
3,763 Flexible Rule-Based Decomposition and Metadata Independence in Modin: A Parallel Dataframe System 2022 VLDB 6.7801795e-05
3,982 The Myria Big Data Management and Analytics System and Cloud Service 2017 CIDR 6.5651188e-05
4,248 Hyper Dimension Shuffle: Efficient Data Repartition at Petabyte Scale in SCOPE 2019 VLDB 6.3247927e-05
4,326 Fast Queries Over Heterogeneous Data Through Engine Customization 2016 VLDB 6.288323e-05
4,572 The Unified Logging Infrastructure for Data Analytics at Twitter 2012 VLDB 6.0760183e-05
4,573 Clydesdale: Structured Data Processing on Hadoop 2012 SIGMOD 6.0753788e-05
4,696 Asynchronous and Fault-Tolerant Recursive Datalog Evaluation in Shared-Nothing Engines 2015 VLDB 5.9911301e-05
5,441 Using Cloud Functions as Accelerator for Elastic Data Analytics 2023 SIGMOD 5.5028093e-05
5,838 HadoopDB in Action: Building Real World Applications 2010 SIGMOD 5.3059032e-05
5,903 Building Wavelet Histograms on Large Data in MapReduce 2012 VLDB 5.2791351e-05
5,941 Big Graphs: Challenges and Opportunities 2022 VLDB 5.2635446e-05
6,061 Towards Energy-Efficient Database Cluster Design 2012 VLDB 5.2304505e-05
6,232 The Next Generation Operational Data Historian for IoT Based on Informix 2014 SIGMOD 5.1453711e-05
6,773 Replication at the Speed of Change – a Fast, Scalable Replication Solution for Near Real-Time HTAP Processing 2020 VLDB 4.9303548e-05
6,823 Are We Experiencing a Big Data Bubble? 2014 SIGMOD 4.9149276e-05
7,270 Oracle In-Database Hadoop: When MapReduce Meets RDBMS 2012 SIGMOD 4.7813984e-05
Previous Page 1 / 2 Next

Outgoing Citations (Sorted by Pagerank)

Showing 7 of 7 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers