Back to papers
MG-Join: A Scalable Join for Massively Parallel Multi-GPU Architectures
Summary: MG-Join proposes a scalable partitioned hash join for multi-GPU single-machine architectures. Adaptive multi-hop cross-GPU routing minimizes congestion, achieving up to 97% bisection-bandwidth utilization, and up to 2.5x join speedups with 4.5x TPC-H gains over Omnisci.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 6145
- Venue
- SIGMOD
- Year
- 2021
- Pagerank
- 6.545665e-05
- Overall Rank
- 4,002 | 72.17%
- DOI
-
10.1145/3448016.3457254
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 20 of 20 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 5,019 |
Orchestrating Data Placement and Query Execution in Heterogeneous CPU-GPU DBMS |
2022 |
VLDB |
5.7559197e-05 |
| 5,040 |
Tile-based Lightweight Integer Compression in GPU |
2022 |
SIGMOD |
5.7425187e-05 |
| 5,247 |
Triton Join: Efficiently Scaling to a Large Join State on GPUs with Fast Interconnects |
2022 |
SIGMOD |
5.6057839e-05 |
| 5,426 |
RTIndeX: Exploiting Hardware-Accelerated GPU Raytracing for Database Indexing |
2023 |
VLDB |
5.5096704e-05 |
| 6,066 |
GPU Database Systems Characterization and Optimization |
2024 |
VLDB |
5.2290447e-05 |
| 6,223 |
Distributed GPU Joins on Fast RDMA-capable Networks |
2023 |
SIGMOD |
5.1496398e-05 |
| 6,453 |
Vortex: Overcoming Memory Capacity Limitations in GPU-Accelerated Large-Scale Data Analytics |
2025 |
VLDB |
5.0571108e-05 |
| 7,155 |
Evaluating Multi-GPU Sorting with Modern Interconnects |
2022 |
SIGMOD |
4.8149812e-05 |
| 7,328 |
BOSS - An Architecture for Database Kernel Composition |
2024 |
VLDB |
4.7610909e-05 |
| 7,568 |
Powerful GPUs or Fast Interconnects: Analyzing Relational Workloads on Modern GPUs |
2025 |
VLDB |
4.7084322e-05 |
| 7,751 |
Efficiently Processing Joins and Grouped Aggregations on GPUs |
2025 |
SIGMOD |
4.6603427e-05 |
| 7,916 |
Terabyte-Scale Analytics in the Blink of an Eye |
2026 |
VLDB |
4.6173899e-05 |
| 8,846 |
Scaling your Hybrid CPU-GPU DBMS to Multiple GPUs |
2024 |
VLDB |
4.4372012e-05 |
| 9,142 |
Design and Analysis of a Processing-in-DIMM Join Algorithm: A Case Study with UPMEM DIMMs |
2023 |
SIGMOD |
4.3853149e-05 |
| 9,838 |
Efficiently Joining Large Relations on Multi-GPU Systems |
2025 |
VLDB |
4.2740344e-05 |
| 10,253 |
Scalable GPU Acceleration of Scalar Functions in Analytical Databases: Compilation, Benchmarking, and Optimization |
2026 |
VLDB |
4.1945683e-05 |
| 10,749 |
Scaling GPU-Accelerated Databases beyond GPU Memory Size |
2025 |
VLDB |
4.1945683e-05 |
| 10,993 |
SPID-Join: A Skew-resistant Processing-in-DIMM Join Algorithm Exploiting the Bank- and Rank-level Parallelisms of DIMMs |
2024 |
SIGMOD |
4.1945683e-05 |
| 11,020 |
Accelerating Merkle Patricia Trie with GPU |
2024 |
VLDB |
4.1945683e-05 |
| 11,358 |
Scaling Equi-Joins |
2022 |
SIGMOD |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 15 of 15 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 775 |
Relational Joins on Graphics Processors |
2008 |
SIGMOD |
0.00016823862 |
| 1,206 |
Rack-Scale In-Memory Join Processing using RDMA |
2015 |
SIGMOD |
0.00013281657 |
| 1,273 |
The Yin and Yang of Processing Data Warehousing Queries on GPU Devices |
2013 |
VLDB |
0.00012912938 |
| 1,804 |
An Experimental Comparison of Thirteen Relational Equi-Joins in Main Memory |
2016 |
SIGMOD |
0.00010501185 |
| 2,519 |
Revisiting Co-Processing for Hash Joins on the Coupled CPU-GPU Architecture |
2013 |
VLDB |
8.6078505e-05 |
| 2,526 |
Track Join: Distributed Joins with Minimal Network Traffic |
2014 |
SIGMOD |
8.5968612e-05 |
| 2,651 |
HetExchange: Encapsulating heterogeneous CPU-GPU parallelism in JIT compiled engines |
2019 |
VLDB |
8.3694317e-05 |
| 3,363 |
CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU Servers |
2019 |
VLDB |
7.1731921e-05 |
| 3,670 |
A Distributed Multi-GPU System for Fast Graph Processing |
2018 |
VLDB |
6.8567044e-05 |
| 4,085 |
In-Cache Query Co-Processing on Coupled CPU-GPU Architectures |
2015 |
VLDB |
6.4620277e-05 |
| 4,363 |
Hardware-conscious Query Processing in GPU-accelerated Analytical Engines |
2019 |
CIDR |
6.2552614e-05 |
| 5,197 |
Data-Parallel Query Processing on Non-Uniform Data |
2020 |
VLDB |
5.6347409e-05 |
| 5,578 |
Ocelot/HyPE: Optimized Data Processing on Heterogeneous Hardware |
2014 |
VLDB |
5.4252837e-05 |
| 6,369 |
Improving Execution Efficiency of Just-in-time Compilation based Query Processing on GPUs |
2021 |
VLDB |
5.0936663e-05 |
| 7,060 |
SquirrelJoin: Network-Aware Distributed Join Processing with Lazy Partitioning |
2017 |
VLDB |
4.8465382e-05 |
Semantically Similar Papers