Database Paper Browser

Back to papers

A Comparison of Approaches to Large-Scale Data Analysis

Summary: Compare MapReduce with parallel DBMSs for large-scale data analysis, tying MR to decades of parallel-SQL work. A 100-node benchmark finds DBMSs load/tune longer but run faster than MR; discusses causes and future-system implications. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
4117
Venue
SIGMOD
Year
2009
Pagerank
0.00073498298
Overall Rank
42 | 99.71%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 50 of 72 citing papers.

Rank Citing Paper Year Venue Pagerank
66 Spark SQL: Relational Data Processing in Spark 2015 SIGMOD 0.00061639801
70 Hive - A Warehousing Solution Over a Map-Reduce Framework 2009 VLDB 0.00059533166
140 The MADlib Analytics Library or MAD Skills, the SQL 2012 VLDB 0.00042270404
157 HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads 2009 VLDB 0.00040397359
167 The Snowflake Elastic Data Warehouse 2016 SIGMOD 0.00039180521
413 HaLoop: Efficient Iterative Data Processing on Large Clusters 2010 VLDB 0.00023904409
447 Efficient Parallel Set-Similarity Joins Using MapReduce 2010 SIGMOD 0.00022900171
542 Shark: SQL and Rich Analytics at Scale 2013 SIGMOD 0.00020595648
794 Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing) 2010 VLDB 0.00016605103
933 An Evaluation of Alternative Architectures for Transaction Processing in the Cloud 2010 SIGMOD 0.00015232301
947 MRShare: Sharing Across Multiple Queries in MapReduce 2010 VLDB 0.00015114576
953 Runtime Measurements in the Cloud: Observing, Analyzing, and Reducing Variance 2010 VLDB 0.00015095431
960 A Comparison of Join Algorithms for Log Processing in MapReduce 2010 SIGMOD 0.00015012242
979 Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads 2012 VLDB 0.0001488055
1,074 Processing Theta-Joins using MapReduce* 2011 SIGMOD 0.00014260096
1,098 Trill: A High-Performance Incremental Query Processor for Diverse Analytics 2015 VLDB 0.00014114442
1,261 Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce 2013 VLDB 0.00012989236
1,280 Automatic Optimization for MapReduce Programs 2011 VLDB 0.0001285503
1,615 The Performance of MapReduce: An In-depth Study 2010 VLDB 0.00011132319
1,727 BigBench: Towards an Industry Standard Benchmark for Big Data Analytics 2013 SIGMOD 0.00010740936
1,800 epiC: an Extensible and Scalable System for Processing Big Data 2014 VLDB 0.00010512649
1,987 TIRAMOLA: Elastic NoSQL Provisioning Through a Cloud Management Platform 2012 SIGMOD 9.8506975e-05
2,067 HippogriffDB: Balancing I/O and GPU Bandwidth in Big Data Analytics 2016 VLDB 9.6392739e-05
2,322 Instant Loading for Main Memory Databases 2013 VLDB 9.034874e-05
2,337 Efficient Processing of Data Warehousing Queries in a Split Execution Environment 2011 SIGMOD 9.0098186e-05
2,350 An Intermediate Representation for Optimizing Machine Learning Pipelines 2019 VLDB 8.9788641e-05
2,439 CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop 2011 VLDB 8.8190594e-05
2,476 A Platform for Scalable One-Pass Analytics using MapReduce 2011 SIGMOD 8.6960139e-05
2,488 Shark: Fast Data Analysis Using Coarse-grained Distributed Memory 2012 SIGMOD 8.6683713e-05
2,526 Track Join: Distributed Joins with Minimal Network Traffic 2014 SIGMOD 8.5968612e-05
2,575 A Latency and Fault-Tolerance Optimizer for Online Parallel Query Plans 2011 SIGMOD 8.5133576e-05
2,747 Stubby: A Transformation-based Optimizer for MapReduce Workflows 2012 VLDB 8.1828918e-05
3,066 HAWQ: A Massively Parallel Processing SQL Engine in Hadoop 2014 SIGMOD 7.6221974e-05
3,180 Energy Management for MapReduce Clusters 2010 VLDB 7.4302009e-05
3,208 Column-Oriented Storage Techniques for MapReduce 2011 VLDB 7.3781897e-05
3,247 Can the Elephants Handle the NoSQL Onslaught? 2012 VLDB 7.3260831e-05
3,265 RHEEM: Enabling Cross-Platform Data Processing - May The Big Data Be With You! - 2018 VLDB 7.3083672e-05
3,343 Comparative Evaluation of Big-Data Systems on Scientific Image Analytics Workloads 2017 VLDB 7.1967343e-05
3,517 Integrating Hadoop and Parallel DBMS 2010 SIGMOD 7.0199423e-05
3,562 MISO: Souping Up Big Data Query Processing with a Multistore System 2014 SIGMOD 6.9694564e-05
3,655 CloudRAMSort: Fast and Efficient Large-Scale Distributed RAM Sort on Shared-Nothing Cluster 2012 SIGMOD 6.8718304e-05
4,573 Clydesdale: Structured Data Processing on Hadoop 2012 SIGMOD 6.0753788e-05
5,014 Dynamically Optimizing Queries over Large Scale Data Platforms 2014 SIGMOD 5.7586174e-05
5,105 Only Aggressive Elephants are Fast Elephants 2012 VLDB 5.694494e-05
5,294 GLADE: Big Data Analytics Made Easy 2012 SIGMOD 5.5810654e-05
5,338 Fast In-Memory SQL Analytics on Typed Graphs 2017 VLDB 5.5629772e-05
5,838 HadoopDB in Action: Building Real World Applications 2010 SIGMOD 5.3059032e-05
5,903 Building Wavelet Histograms on Large Data in MapReduce 2012 VLDB 5.2791351e-05
6,061 Towards Energy-Efficient Database Cluster Design 2012 VLDB 5.2304505e-05
6,232 The Next Generation Operational Data Historian for IoT Based on Informix 2014 SIGMOD 5.1453711e-05
Previous Page 1 / 2 Next

Outgoing Citations (Sorted by Pagerank)

Showing 7 of 7 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers