Database Paper Browser

Back to papers

Submodularity of Distributed Join Computation

Summary: Distributed equi-join under join-attribute skew; fine-grained partitioning trades input duplication for reduced load variance. Minimizing load variance under an average constraint is a monotone submodular knapsack problem, enabling near-optimal greedy solutions; works for general load models and deterministic assignment with experiments. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
5448
Venue
SIGMOD
Year
2018
Pagerank
4.8153963e-05
Overall Rank
7,153 | 50.24%
DOI
10.1145/3183713.3183728

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 4 of 4 citing papers.

Rank Citing Paper Year Venue Pagerank
6,619 Near-Optimal Distributed Band-Joins through Recursive Partitioning 2020 SIGMOD 4.9910152e-05
7,836 NOCAP: Near-Optimal Correlation-Aware Partitioning Joins 2023 SIGMOD 4.6380835e-05
8,462 Topology-aware Parallel Data Processing: Models, Algorithms and Systems at Scale 2020 CIDR 4.5056381e-05
11,672 Block as a Value for SQL over NoSQL 2019 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 26 of 26 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
78 Multiprocessor Hash-Based Join Algorithms 1985 VLDB 0.00056413752
232 A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment 1989 SIGMOD 0.00032122485
447 Efficient Parallel Set-Similarity Joins Using MapReduce 2010 SIGMOD 0.00022900171
588 Practical Skew Handling in Parallel Joins 1992 VLDB 0.00019604754
861 A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins 1991 VLDB 0.00015848554
868 Profiling, What-if Analysis, and Cost-based Optimization of MapReduce Programs 2011 VLDB 0.00015789681
960 A Comparison of Join Algorithms for Log Processing in MapReduce 2010 SIGMOD 0.00015012242
1,063 Tradeoffs in Processing Complex Join Queries via Hashing in Multiprocessor Database Machines 1990 VLDB 0.00014362773
1,074 Processing Theta-Joins using MapReduce* 2011 SIGMOD 0.00014260096
1,232 Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC) 1990 VLDB 0.00013147188
1,334 SkewTune: Mitigating Skew in MapReduce Applications 2012 SIGMOD 0.0001250413
1,365 Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning 1991 VLDB 0.00012368421
1,915 Handling Data Skew in Parallel Joins in Shared-Nothing Systems 2008 SIGMOD 0.00010104123
1,931 Efficient Processing of k Nearest Neighbor Joins using MapReduce 2012 VLDB 0.00010040427
1,939 From Theory to Practice: Efficient Join Query Evaluation in a Parallel Database System 2015 SIGMOD 0.00010025655
2,212 Skew in Parallel Query Processing 2014 PODS 9.2771827e-05
2,526 Track Join: Distributed Joins with Minimal Network Traffic 2014 SIGMOD 8.5968612e-05
3,141 ClusterJoin: A Similarity Joins Framework using Map-Reduce 2014 VLDB 7.4829448e-05
3,382 Scalable and Adaptive Online Joins 2014 VLDB 7.1597145e-05
3,528 Distributed Data Deduplication 2016 VLDB 7.0066139e-05
3,893 Estimation of Query-Result Distribution and its Application in Parallel-Join Load Balancing 1996 VLDB 6.6584217e-05
4,132 Advanced Join Strategies for Large-Scale Distributed Computation 2014 VLDB 6.4241067e-05
4,147 Exploiting MapReduce-based Similarity Joins 2012 SIGMOD 6.4096022e-05
5,118 AdaptDB: Adaptive Partitioning for Distributed Joins 2017 VLDB 5.6820984e-05
5,960 Skew-Aware Join Optimization for Array Databases 2015 SIGMOD 5.2559595e-05
6,241 Scaling Similarity Joins over Tree-Structured Data 2015 VLDB 5.1411469e-05
Previous Page 1 / 1 Next

Semantically Similar Papers