Back to papers
Submodularity of Distributed Join Computation
Summary: Distributed equi-join under join-attribute skew; fine-grained partitioning trades input duplication for reduced load variance. Minimizing load variance under an average constraint is a monotone submodular knapsack problem, enabling near-optimal greedy solutions; works for general load models and deterministic assignment with experiments.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 5448
- Venue
- SIGMOD
- Year
- 2018
- Pagerank
- 4.8153963e-05
- Overall Rank
- 7,153 | 50.24%
- DOI
-
10.1145/3183713.3183728
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 4 of 4 citing papers.
Outgoing Citations (Sorted by Pagerank)
Showing 26 of 26 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 78 |
Multiprocessor Hash-Based Join Algorithms |
1985 |
VLDB |
0.00056413752 |
| 232 |
A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment |
1989 |
SIGMOD |
0.00032122485 |
| 447 |
Efficient Parallel Set-Similarity Joins Using MapReduce |
2010 |
SIGMOD |
0.00022900171 |
| 588 |
Practical Skew Handling in Parallel Joins |
1992 |
VLDB |
0.00019604754 |
| 861 |
A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins |
1991 |
VLDB |
0.00015848554 |
| 868 |
Profiling, What-if Analysis, and Cost-based Optimization of MapReduce Programs |
2011 |
VLDB |
0.00015789681 |
| 960 |
A Comparison of Join Algorithms for Log Processing in MapReduce |
2010 |
SIGMOD |
0.00015012242 |
| 1,063 |
Tradeoffs in Processing Complex Join Queries via Hashing in Multiprocessor Database Machines |
1990 |
VLDB |
0.00014362773 |
| 1,074 |
Processing Theta-Joins using MapReduce* |
2011 |
SIGMOD |
0.00014260096 |
| 1,232 |
Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC) |
1990 |
VLDB |
0.00013147188 |
| 1,334 |
SkewTune: Mitigating Skew in MapReduce Applications |
2012 |
SIGMOD |
0.0001250413 |
| 1,365 |
Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning |
1991 |
VLDB |
0.00012368421 |
| 1,915 |
Handling Data Skew in Parallel Joins in Shared-Nothing Systems |
2008 |
SIGMOD |
0.00010104123 |
| 1,931 |
Efficient Processing of k Nearest Neighbor Joins using MapReduce |
2012 |
VLDB |
0.00010040427 |
| 1,939 |
From Theory to Practice: Efficient Join Query Evaluation in a Parallel Database System |
2015 |
SIGMOD |
0.00010025655 |
| 2,212 |
Skew in Parallel Query Processing |
2014 |
PODS |
9.2771827e-05 |
| 2,526 |
Track Join: Distributed Joins with Minimal Network Traffic |
2014 |
SIGMOD |
8.5968612e-05 |
| 3,141 |
ClusterJoin: A Similarity Joins Framework using Map-Reduce |
2014 |
VLDB |
7.4829448e-05 |
| 3,382 |
Scalable and Adaptive Online Joins |
2014 |
VLDB |
7.1597145e-05 |
| 3,528 |
Distributed Data Deduplication |
2016 |
VLDB |
7.0066139e-05 |
| 3,893 |
Estimation of Query-Result Distribution and its Application in Parallel-Join Load Balancing |
1996 |
VLDB |
6.6584217e-05 |
| 4,132 |
Advanced Join Strategies for Large-Scale Distributed Computation |
2014 |
VLDB |
6.4241067e-05 |
| 4,147 |
Exploiting MapReduce-based Similarity Joins |
2012 |
SIGMOD |
6.4096022e-05 |
| 5,118 |
AdaptDB: Adaptive Partitioning for Distributed Joins |
2017 |
VLDB |
5.6820984e-05 |
| 5,960 |
Skew-Aware Join Optimization for Array Databases |
2015 |
SIGMOD |
5.2559595e-05 |
| 6,241 |
Scaling Similarity Joins over Tree-Structured Data |
2015 |
VLDB |
5.1411469e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 6,659 |
Fast and Effective Distribution-Key Recommendation for Amazon Redshift |
2020 |
VLDB |
4.9710856e-05 |
| 11,797 |
Runtime Optimization of Join Location in Parallel Data Management Systems |
2017 |
VLDB |
4.1945683e-05 |
| 3,893 |
Estimation of Query-Result Distribution and its Application in Parallel-Join Load Balancing |
1996 |
VLDB |
6.6584217e-05 |
| 8,061 |
Efficient Computation of Quantiles over Joins |
2023 |
PODS |
4.5943269e-05 |
| 2,925 |
Shared Workload Optimization |
2014 |
VLDB |
7.888494e-05 |
| 9,581 |
Sharing Aggregate Computation for Distributed Queries |
2007 |
SIGMOD |
4.3227214e-05 |
| 4,460 |
Performance Analysis of a Load Balancing Hash-Join Algorithm for a Shared Memory Multiprocessor |
1991 |
VLDB |
6.1635864e-05 |
| 1,953 |
Distributed Evaluation of Subgraph Queries Using Worst-case Optimal Low-Memory Dataflows |
2018 |
VLDB |
9.9665955e-05 |
| 11,890 |
Let's Rethink Join Optimization in Distributed Systems |
2015 |
CIDR |
4.1945683e-05 |
| 6,619 |
Near-Optimal Distributed Band-Joins through Recursive Partitioning |
2020 |
SIGMOD |
4.9910152e-05 |