Back to papers
A Distributed Multi-GPU System for Fast Graph Processing
Summary: Lux: distributed multi-GPU graph-processing system using aggregate bandwidth and locality; dual execution models with cheap dynamic load balancing. Runtime model tunes configurations; up to 20x speedup vs shared memory and up to 100x vs distributed.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 11744
- Venue
- VLDB
- Year
- 2018
- Pagerank
- 6.8567044e-05
- Overall Rank
- 3,670 | 74.47%
- DOI
-
10.14778/3157794.3157799
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 10 of 10 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 3,025 |
NeutronStar: Distributed GNN Training with Hybrid Dependency Management |
2022 |
SIGMOD |
7.6906935e-05 |
| 4,002 |
MG-Join: A Scalable Join for Massively Parallel Multi-GPU Architectures |
2021 |
SIGMOD |
6.545665e-05 |
| 4,522 |
GPU-based Graph Traversal on Compressed Graphs |
2019 |
SIGMOD |
6.1146374e-05 |
| 6,985 |
CompressGraph: Efficient Parallel Graph Analytics with Rule-Based Compression |
2023 |
SIGMOD |
4.8729387e-05 |
| 7,225 |
Self-adaptive Graph Traversal on GPUs |
2021 |
SIGMOD |
4.7956162e-05 |
| 9,471 |
Nezha: An Efficient Distributed Graph Processing System on Heterogeneous Hardware |
2025 |
SIGMOD |
4.3341665e-05 |
| 10,044 |
ACGraph: An Efficient Asynchronous Out-of-Core Graph Processing Framework |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,473 |
Clementi: Efficient Load Balancing and Communication Overlap for Multi-FPGA Graph Processing |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,514 |
cuMatch: A GPU-based Memory-Efficient Worst-case Optimal Join Processing Method for Subgraph Queries with Complex Patterns |
2025 |
SIGMOD |
4.1945683e-05 |
| 11,026 |
Improving Graph Compression for Efficient Resource-Constrained Graph Analytics |
2024 |
VLDB |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 6 of 6 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 1,103 |
Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture |
2021 |
VLDB |
0.00014025101 |
| 2,330 |
Concurrent Analytical Query Processing with GPUs |
2014 |
VLDB |
9.0192228e-05 |
| 5,017 |
TurboGraph++: A Scalable and Fast Graph Analytics System |
2018 |
SIGMOD |
5.7574792e-05 |
| 3,834 |
GTS: A Fast and Scalable Graph Processing Method based on Streaming Topology to GPUs |
2016 |
SIGMOD |
6.7173094e-05 |
| 3,597 |
Parallel Local Graph Clustering |
2016 |
VLDB |
6.9345175e-05 |
| 4,577 |
Accelerating Dynamic Graph Analytics on GPUs |
2018 |
VLDB |
6.0709631e-05 |
| 1,877 |
Large-Scale Distributed Graph Computing Systems: An Experimental Evaluation |
2015 |
VLDB |
0.00010236803 |
| 8,115 |
Start Late or Finish Early: A Distributed Graph Processing System with Redundancy Reduction |
2019 |
VLDB |
4.5816155e-05 |
| 5,799 |
CGgraph: An Ultra-fast Graph Processing System on Modern Commodity CPU-GPU Co-processor |
2024 |
VLDB |
5.3219334e-05 |
| 10,863 |
Towards Sufficient GPU-accelerated Dynamic Graph Management: Survey and Experiment |
2025 |
VLDB |
4.1945683e-05 |