Efficient Join Algorithms For Large Database Tables in a Multi-GPU Environment
Summary: Multi-GPU join algorithms for very large tables; three designs: nested-loop, global-sort-merge, hybrid. Addresses CPU–GPU data-transfer bottlenecks; shows scalability and up to 25× vs multi-core CPU and 2.8× vs single-GPU. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Ran Rui
- 2. Hao Li
- 3. Yi-Cheng Tu
Incoming Citations (Sorted by Pagerank)
Showing 20 of 20 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 8 of 8 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 351 | Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs | 2009 | VLDB | 0.0002636504 |
| 404 | Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited | 2014 | VLDB | 0.00024143076 |
| 540 | Design and Evaluation of Main Memory Hash Join Algorithms for Multi-core CPUs | 2011 | SIGMOD | 0.0002063443 |
| 585 | Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems | 2012 | VLDB | 0.00019706145 |
| 775 | Relational Joins on Graphics Processors | 2008 | SIGMOD | 0.00016823862 |
| 1,016 | Memory-Efficient Hash Joins | 2015 | VLDB | 0.00014638492 |
| 1,273 | The Yin and Yang of Processing Data Warehousing Queries on GPU Devices | 2013 | VLDB | 0.00012912938 |
| 2,519 | Revisiting Co-Processing for Hash Joins on the Coupled CPU-GPU Architecture | 2013 | VLDB | 8.6078505e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 351 | Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs | 2009 | VLDB | 0.0002636504 |
| 2,640 | Design and Evaluation of Parallel Pipelined Join Algorithms | 1987 | SIGMOD | 8.3924401e-05 |
| 1,939 | From Theory to Practice: Efficient Join Query Evaluation in a Parallel Database System | 2015 | SIGMOD | 0.00010025655 |
| 2,044 | Optimization of Multi-Way Join Queries for Parallel Execution | 1991 | VLDB | 9.6953608e-05 |
| 6,056 | Efficient Massively Parallel Join Optimization for Large Queries* | 2022 | SIGMOD | 5.2321475e-05 |
| 6,223 | Distributed GPU Joins on Fast RDMA-capable Networks | 2023 | SIGMOD | 5.1496398e-05 |
| 5,247 | Triton Join: Efficiently Scaling to a Large Join State on GPUs with Fast Interconnects | 2022 | SIGMOD | 5.6057839e-05 |
| 7,751 | Efficiently Processing Joins and Grouped Aggregations on GPUs | 2025 | SIGMOD | 4.6603427e-05 |
| 775 | Relational Joins on Graphics Processors | 2008 | SIGMOD | 0.00016823862 |
| 9,838 | Efficiently Joining Large Relations on Multi-GPU Systems | 2025 | VLDB | 4.2740344e-05 |