Back to papers
Distributed GPU Joins on Fast RDMA-capable Networks
Summary: Pipelined distributed GPU joins on fast RDMA networks overlap shuffling with build/probe to hide GPU idle time. RDMA/GPUDirect-based algorithms scale to arbitrarily large tables and show up to 6x faster full queries versus CPU-only joins.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 6532
- Venue
- SIGMOD
- Year
- 2023
- Pagerank
- 5.1496398e-05
- Overall Rank
- 6,223 | 56.71%
- DOI
-
10.1145/3588709
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 12 of 12 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 7,568 |
Powerful GPUs or Fast Interconnects: Analyzing Relational Workloads on Modern GPUs |
2025 |
VLDB |
4.7084322e-05 |
| 7,751 |
Efficiently Processing Joins and Grouped Aggregations on GPUs |
2025 |
SIGMOD |
4.6603427e-05 |
| 7,916 |
Terabyte-Scale Analytics in the Blink of an Eye |
2026 |
VLDB |
4.6173899e-05 |
| 8,649 |
Zero-sided RDMA: Network-driven Data Shuffling for Disaggregated Heterogeneous Cloud DBMSs |
2024 |
SIGMOD |
4.4762914e-05 |
| 8,846 |
Scaling your Hybrid CPU-GPU DBMS to Multiple GPUs |
2024 |
VLDB |
4.4372012e-05 |
| 9,456 |
DPDPU: Data Processing with DPUs |
2025 |
CIDR |
4.3385595e-05 |
| 9,838 |
Efficiently Joining Large Relations on Multi-GPU Systems |
2025 |
VLDB |
4.2740344e-05 |
| 10,143 |
Beluga: A CXL-Based Memory Architecture for Scalable and Efficient LLM KVCache Management |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,253 |
Scalable GPU Acceleration of Scalar Functions in Analytical Databases: Compilation, Benchmarking, and Optimization |
2026 |
VLDB |
4.1945683e-05 |
| 10,749 |
Scaling GPU-Accelerated Databases beyond GPU Memory Size |
2025 |
VLDB |
4.1945683e-05 |
| 10,856 |
Analyzing Near-Network Hardware Acceleration with Co-Processing on DPUs |
2025 |
VLDB |
4.1945683e-05 |
| 10,981 |
Enabling Adaptive Sampling for Intra-Window Join: Simultaneously Optimizing Quantity and Quality |
2024 |
SIGMOD |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 20 of 20 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 404 |
Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited |
2014 |
VLDB |
0.00024143076 |
| 775 |
Relational Joins on Graphics Processors |
2008 |
SIGMOD |
0.00016823862 |
| 930 |
Fast Sort on CPUs and GPUs: A Case for Bandwidth Oblivious SIMD Sort |
2010 |
SIGMOD |
0.00015238545 |
| 1,206 |
Rack-Scale In-Memory Join Processing using RDMA |
2015 |
SIGMOD |
0.00013281657 |
| 1,273 |
The Yin and Yang of Processing Data Warehousing Queries on GPU Devices |
2013 |
VLDB |
0.00012912938 |
| 1,361 |
The End of Slow Networks: It's Time for a Redesign |
2016 |
VLDB |
0.00012379741 |
| 1,819 |
The End of a Myth: Distributed Transactions Can Scale |
2017 |
VLDB |
0.00010429773 |
| 1,852 |
Designing Distributed Tree-based Index Structures for Fast RDMA-capable Networks |
2019 |
SIGMOD |
0.00010322492 |
| 2,040 |
A Study of the Fundamental Performance Characteristics of GPUs and CPUs for Database Analytics |
2020 |
SIGMOD |
9.7057698e-05 |
| 2,287 |
Pipelined Query Processing in Coprocessor Environments |
2018 |
SIGMOD |
9.0972606e-05 |
| 2,519 |
Revisiting Co-Processing for Hash Joins on the Coupled CPU-GPU Architecture |
2013 |
VLDB |
8.6078505e-05 |
| 2,916 |
Quantifying TPC-H Choke Points and Their Optimizations |
2020 |
VLDB |
7.9068048e-05 |
| 3,305 |
Robust Query Processing in Co-Processor-accelerated Databases |
2016 |
SIGMOD |
7.2460965e-05 |
| 3,327 |
Pump Up the Volume: Processing Large Data on GPUs with Fast Interconnects |
2020 |
SIGMOD |
7.2205738e-05 |
| 3,443 |
Distributed Join Algorithms on Thousands of Cores |
2017 |
VLDB |
7.0887214e-05 |
| 3,696 |
Why it is time for a HyPE: A Hybrid Query Processing Engine for Efficient GPU Coprocessing in DBMS |
2013 |
VLDB |
6.834483e-05 |
| 3,898 |
Efficient Join Algorithms For Large Database Tables in a Multi-GPU Environment |
2021 |
VLDB |
6.6551268e-05 |
| 4,002 |
MG-Join: A Scalable Join for Massively Parallel Multi-GPU Architectures |
2021 |
SIGMOD |
6.545665e-05 |
| 4,483 |
DFI: The Data Flow Interface for High-Speed Networks |
2021 |
SIGMOD |
6.148188e-05 |
| 5,247 |
Triton Join: Efficiently Scaling to a Large Join State on GPUs with Fast Interconnects |
2022 |
SIGMOD |
5.6057839e-05 |
Semantically Similar Papers