Back to papers
NUMA-aware algorithms: the case of data shuffling
Summary: Demonstrates that NUMA effects critically impact data shuffling on multi-socket multicore servers, with naive shuffling up to 3× slower than NUMA-aware variants. Achieves top performance using thread binding, NUMA-aware thread allocation, and relaxed global coordination, arguing such algorithmic redesign is essential as socket counts and memory heterogeneity grow.
(summarized by gpt-5-mini on Feb 09 2026)
- Paper ID
- 186
- Venue
- CIDR
- Year
- 2013
- Pagerank
- 0.0001145318
- Overall Rank
- 1,543 | 89.27%
- DOI
-
-
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 27 of 27 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 241 |
DB2 with BLU Acceleration: So Much More than Just a Column Store |
2013 |
VLDB |
0.00031420034 |
| 404 |
Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited |
2014 |
VLDB |
0.00024143076 |
| 418 |
Morsel-Driven Parallelism: A NUMA-Aware Query Evaluation Framework for the Many-Core Age |
2014 |
SIGMOD |
0.00023729211 |
| 1,016 |
Memory-Efficient Hash Joins |
2015 |
VLDB |
0.00014638492 |
| 1,044 |
DimmWitted: A Study of Main-Memory Statistical Analytics |
2014 |
VLDB |
0.00014475229 |
| 1,409 |
High-Speed Query Processing over High-Speed Networks |
2016 |
VLDB |
0.00012132768 |
| 1,607 |
A Comprehensive Study of Main-Memory Partitioning and its Application to Large-Scale Comparison- and Radix-Sort |
2014 |
SIGMOD |
0.00011162682 |
| 2,424 |
Lambada: Interactive Data Analytics on Cold Data Using Serverless Cloud Infrastructure |
2020 |
SIGMOD |
8.8380822e-05 |
| 2,519 |
Revisiting Co-Processing for Hash Joins on the Coupled CPU-GPU Architecture |
2013 |
VLDB |
8.6078505e-05 |
| 3,327 |
Pump Up the Volume: Processing Large Data on GPUs with Fast Interconnects |
2020 |
SIGMOD |
7.2205738e-05 |
| 4,248 |
Hyper Dimension Shuffle: Efficient Data Repartition at Petabyte Scale in SCOPE |
2019 |
VLDB |
6.3247927e-05 |
| 4,282 |
Scaling Up Concurrent Main-Memory Column-Store Scans: Towards Adaptive NUMA-aware Data and Task Placement |
2015 |
VLDB |
6.293052e-05 |
| 4,610 |
Deployment of Query Plans on Multicores |
2015 |
VLDB |
6.0516573e-05 |
| 5,109 |
Adaptive NUMA-aware data placement and task scheduling for analytical workloads in main-memory column-stores |
2017 |
VLDB |
5.6908086e-05 |
| 5,657 |
BriskStream: Scaling Data Stream Processing on Shared-Memory Multicore Architectures |
2019 |
SIGMOD |
5.3864606e-05 |
| 5,866 |
Low-Latency Handshake Join |
2014 |
VLDB |
5.2968632e-05 |
| 5,877 |
Taming Subgraph Isomorphism for RDF Query Processing |
2015 |
VLDB |
5.2916612e-05 |
| 6,648 |
Grizzly: Efficient Stream Processing Through Adaptive Query Compilation |
2020 |
SIGMOD |
4.9771723e-05 |
| 7,866 |
Operational Analytics Data Management Systems |
2016 |
VLDB |
4.6321795e-05 |
| 7,916 |
Terabyte-Scale Analytics in the Blink of an Eye |
2026 |
VLDB |
4.6173899e-05 |
| 8,417 |
The Case for Learned In-Memory Joins |
2023 |
VLDB |
4.5194164e-05 |
| 8,513 |
CXL Memory Performance for In-Memory Data Processing |
2025 |
VLDB |
4.4947795e-05 |
| 9,070 |
How to Stop Under-Utilization and Love Multicores |
2014 |
SIGMOD |
4.4031183e-05 |
| 9,823 |
Thriving in the No Man’s Land between Compilers and Databases |
2019 |
CIDR |
4.2754485e-05 |
| 10,190 |
P-MOSS: Scheduling Main-Memory Indexes Over NUMA Servers Using Next Token Prediction |
2026 |
SIGMOD |
4.1945683e-05 |
| 11,154 |
Templating Shuffles |
2023 |
CIDR |
4.1945683e-05 |
| 12,062 |
Next Generation Data Analytics at IBM Research |
2013 |
VLDB |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 9 of 9 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Semantically Similar Papers