Database Paper Browser

Back to papers

A Comparison of Join Algorithms for Log Processing in MapReduce

Summary: Evaluates common join strategies for log processing in MapReduce, highlighting platform-specific trade-offs and implementation details. Provides an empirical comparison on a 100-node Hadoop cluster to guide when to apply particular join algorithms in MapReduce workflows. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
4310
Venue
SIGMOD
Year
2010
Pagerank
0.00015012242
Overall Rank
960 | 93.33%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 33 of 33 citing papers.

Rank Citing Paper Year Venue Pagerank
868 Profiling, What-if Analysis, and Cost-based Optimization of MapReduce Programs 2011 VLDB 0.00015789681
913 Tenzing A SQL Implementation On The MapReduce Framework 2011 VLDB 0.00015408131
1,074 Processing Theta-Joins using MapReduce* 2011 SIGMOD 0.00014260096
1,226 Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management 2013 SIGMOD 0.00013180799
1,261 Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce 2013 VLDB 0.00012989236
1,286 Photon: Fault-tolerant and Scalable Joining of Continuous Data Streams 2013 SIGMOD 0.0001282373
1,863 Cheetah: A High Performance, Custom Data Warehouse on Top of MapReduce 2010 VLDB 0.00010286531
2,337 Efficient Processing of Data Warehousing Queries in a Split Execution Environment 2011 SIGMOD 9.0098186e-05
2,439 CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop 2011 VLDB 8.8190594e-05
2,674 Minimal MapReduce Algorithms 2013 SIGMOD 8.3328645e-05
2,757 Parallel Data Analysis Directly on Scientific File Formats 2014 SIGMOD 8.1679384e-05
3,115 Llama: Leveraging Columnar Storage for Scalable Join Processing in the MapReduce Framework 2011 SIGMOD 7.543505e-05
3,382 Scalable and Adaptive Online Joins 2014 VLDB 7.1597145e-05
4,132 Advanced Join Strategies for Large-Scale Distributed Computation 2014 VLDB 6.4241067e-05
5,014 Dynamically Optimizing Queries over Large Scale Data Platforms 2014 SIGMOD 5.7586174e-05
5,105 Only Aggressive Elephants are Fast Elephants 2012 VLDB 5.694494e-05
5,356 LogKV: Exploiting Key-Value Stores for Event Log Processing 2013 CIDR 5.5509715e-05
5,532 A Padded Encoding Scheme to Accelerate Scans by Leveraging Skew 2015 SIGMOD 5.4548897e-05
5,902 The Communication Complexity of Distributed Set-Joins with Applications to Matrix Multiplication 2015 PODS 5.2796864e-05
6,226 Cloud-based RDF Data Management 2014 SIGMOD 5.1476331e-05
6,304 Elastic Pipelining in an In-Memory Database Cluster 2016 SIGMOD 5.1210182e-05
6,507 Similarity Join over Array Data 2016 SIGMOD 5.0337166e-05
6,619 Near-Optimal Distributed Band-Joins through Recursive Partitioning 2020 SIGMOD 4.9910152e-05
6,745 DistME: A Fast and Elastic Distributed Matrix Computation Engine using GPUs 2019 SIGMOD 4.9417155e-05
7,060 SquirrelJoin: Network-Aware Distributed Join Processing with Lazy Partitioning 2017 VLDB 4.8465382e-05
7,153 Submodularity of Distributed Join Computation 2018 SIGMOD 4.8153963e-05
7,215 SyncSignature: A Simple, Efficient, Parallelizable Framework for Tree Similarity Joins 2023 VLDB 4.7985991e-05
7,599 Quill: Efficient, Transferable, and Rich Analytics at Scale 2016 VLDB 4.7003593e-05
9,115 MapReduce Algorithms for Big Data Analysis 2012 VLDB 4.3932167e-05
9,375 Efficient Big Data Processing in Hadoop MapReduce 2012 VLDB 4.347384e-05
11,358 Scaling Equi-Joins 2022 SIGMOD 4.1945683e-05
11,831 Logical Aspects of Massively Parallel and Distributed Systems 2016 PODS 4.1945683e-05
11,882 Parallel Evaluation of Multi-Semi-Joins 2016 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 5 of 5 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers