Evaluating Multi-GPU Sorting with Modern Interconnects
Summary: Evaluates multi-GPU sorting across PCIe/NVLink/NVSwitch; proposes a P2P GPU-only sort and a heterogeneous sort, benchmarked on three modern platforms. Reports up to 35x higher P2P throughput with NVSwitch, up to 14x CPU radix-sort speedup (P2P) and 9x (HET); on fast interconnects P2P beats HET by ~1.65x, and copy/compute overlap does not hide transfer bottlenecks. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Tobias Maltenberger
- 2. Ivan Ilic
- 3. Ilin Tolovski
- 4. Tilmann Rabl
Incoming Citations (Sorted by Pagerank)
Showing 9 of 9 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,247 | Triton Join: Efficiently Scaling to a Large Join State on GPUs with Fast Interconnects | 2022 | SIGMOD | 5.6057839e-05 |
| 6,066 | GPU Database Systems Characterization and Optimization | 2024 | VLDB | 5.2290447e-05 |
| 6,453 | Vortex: Overcoming Memory Capacity Limitations in GPU-Accelerated Large-Scale Data Analytics | 2025 | VLDB | 5.0571108e-05 |
| 7,328 | BOSS - An Architecture for Database Kernel Composition | 2024 | VLDB | 4.7610909e-05 |
| 7,916 | Terabyte-Scale Analytics in the Blink of an Eye | 2026 | VLDB | 4.6173899e-05 |
| 8,478 | Analyzing Vectorized Hash Tables Across CPU Architectures | 2023 | VLDB | 4.5015937e-05 |
| 9,456 | DPDPU: Data Processing with DPUs | 2025 | CIDR | 4.3385595e-05 |
| 9,838 | Efficiently Joining Large Relations on Multi-GPU Systems | 2025 | VLDB | 4.2740344e-05 |
| 10,281 | GPU Acceleration of SQL Analytics on Compressed Data | 2026 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 18 of 18 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next