Database Paper Browser

Back to papers

A Comparison of Approaches to Large-Scale Data Analysis

Summary: Compare MapReduce with parallel DBMSs for large-scale data analysis, tying MR to decades of parallel-SQL work. A 100-node benchmark finds DBMSs load/tune longer but run faster than MR; discusses causes and future-system implications. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
4117
Venue
SIGMOD
Year
2009
Pagerank
0.00073498298
Overall Rank
42 | 99.71%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 22 of 72 citing papers.

Rank Citing Paper Year Venue Pagerank
6,268 Speedup Your Analytics: Automatic Parameter Tuning for Databases and Big Data Systems 2019 VLDB 5.133857e-05
6,282 Cheetah: Accelerating Database Queries with Switch Pruning 2020 SIGMOD 5.128797e-05
6,636 Adapting TPC-C Benchmark to Measure Performance of Multi-Document Transactions in MongoDB 2019 VLDB 4.9820843e-05
6,665 Cheap Data Analytics using Cold Storage Devices 2016 VLDB 4.9697181e-05
6,821 Hadoop's Adolescence: An analysis of Hadoop usage in scientific workloads 2013 VLDB 4.9156923e-05
7,294 Optimization for iterative queries on MapReduce 2014 VLDB 4.773119e-05
7,477 Benchmarking Spreadsheet Systems 2020 SIGMOD 4.7188671e-05
7,599 Quill: Efficient, Transferable, and Rich Analytics at Scale 2016 VLDB 4.7003593e-05
7,877 Emerging Trends in the Enterprise Data Analytics: Connecting Hadoop and DB2 Warehouse 2011 SIGMOD 4.6297559e-05
7,902 Building Highly-Optimized, Low-Latency Pipelines for Genomic Data Analysis 2015 CIDR 4.6215911e-05
7,958 CARTILAGE: Adding Flexibility to the Hadoop Skeleton 2013 SIGMOD 4.613363e-05
8,084 ScalaGiST: Scalable Generalized Search Trees for MapReduce Systems [Innovative Systems Paper] 2014 VLDB 4.5902866e-05
9,004 DataGarage: Warehousing Massive Performance Data on Commodity Servers 2010 VLDB 4.4102022e-05
9,375 Efficient Big Data Processing in Hadoop MapReduce 2012 VLDB 4.347384e-05
9,490 Auto-BI: Automatically Build BI-Models Leveraging Local Join Prediction and Global Schema Graph 2023 VLDB 4.3341665e-05
11,126 High-Performance Spatial Data Analytics: Systematic R&D for Scale-Out and Scale-Up Solutions from the Past to Now 2024 VLDB 4.1945683e-05
11,573 Towards Scalable UDTFs in Noria 2020 SIGMOD 4.1945683e-05
11,690 Integration of Large-Scale Data Processing Systems and Traditional Parallel Database Technology 2019 VLDB 4.1945683e-05
11,694 An Experimental Evaluation of Garbage Collectors on Big Data Applications 2019 VLDB 4.1945683e-05
11,708 RAPID: In-Memory Analytical Query Processing Engine with Extreme Performance per Watt 2018 SIGMOD 4.1945683e-05
11,894 Building Highly-Optimized, Low-Latency Pipelines for Genomic Data Analysis 2015 CIDR 4.1945683e-05
11,987 DGFIndex for Smart Grid: Enhancing Hive with a Cost-Effective Multidimensional Range Index 2014 VLDB 4.1945683e-05
Previous Page 2 / 2 Next

Outgoing Citations (Sorted by Pagerank)

Showing 7 of 7 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers