Database Paper Browser

Back to papers

Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs

Summary: Revisits hash join vs sort-merge join on modern multi-core CPUs with optimized parallel implementations. Hash join hits >100M tuples/s on Intel Core i7; sort-merge delivers 47–80M. Analytical models indicate wider SIMD and more cores favor sort-merge, suggesting future architectures may swing dominance away from hash. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
9865
Venue
VLDB
Year
2009
Pagerank
0.0002636504
Overall Rank
351 | 97.57%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 50 of 80 citing papers.

Rank Citing Paper Year Venue Pagerank
381 FAST: Fast Architecture Sensitive Tree Search on Modern CPUs and GPUs 2010 SIGMOD 0.00024873637
404 Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited 2014 VLDB 0.00024143076
418 Morsel-Driven Parallelism: A NUMA-Aware Query Evaluation Framework for the Many-Core Age 2014 SIGMOD 0.00023729211
540 Design and Evaluation of Main Memory Hash Join Algorithms for Multi-core CPUs 2011 SIGMOD 0.0002063443
585 Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems 2012 VLDB 0.00019706145
930 Fast Sort on CPUs and GPUs: A Case for Bandwidth Oblivious SIMD Sort 2010 SIGMOD 0.00015238545
940 SharedDB: Killing One Thousand Queries With One Stone 2012 VLDB 0.00015173166
958 Rethinking SIMD Vectorization for In-Memory Databases 2015 SIGMOD 0.00015045316
1,016 Memory-Efficient Hash Joins 2015 VLDB 0.00014638492
1,044 DimmWitted: A Study of Main-Memory Statistical Analytics 2014 VLDB 0.00014475229
1,206 Rack-Scale In-Memory Join Processing using RDMA 2015 SIGMOD 0.00013281657
1,269 Cache locality is not enough: High-Performance Nearest Neighbor Search with Product Quantization Fast Scan 2016 VLDB 0.00012930432
1,287 Hardware-Oblivious Parallelism for In-Memory Column-Stores 2013 VLDB 0.00012820443
1,607 A Comprehensive Study of Main-Memory Partitioning and its Application to Large-Scale Comparison- and Radix-Sort 2014 SIGMOD 0.00011162682
1,731 Fast Updates on Read-Optimized Databases Using Multi-Core CPUs 2012 VLDB 0.0001073454
1,804 An Experimental Comparison of Thirteen Relational Equi-Joins in Main Memory 2016 SIGMOD 0.00010501185
2,296 Joins via Geometric Resolutions: Worst-case and Beyond 2015 PODS 9.0776226e-05
2,390 ByteSlice: Pushing the Envelop of Main Memory Data Processing with a New Storage Layout 2015 SIGMOD 8.9084657e-05
2,519 Revisiting Co-Processing for Hash Joins on the Coupled CPU-GPU Architecture 2013 VLDB 8.6078505e-05
2,526 Track Join: Distributed Joins with Minimal Network Traffic 2014 SIGMOD 8.5968612e-05
2,742 Cache-Efficient Aggregation: Hashing Is Sorting 2015 SIGMOD 8.1906104e-05
2,870 Streaming Similarity Search over one Billion Tweets using Parallel Locality-Sensitive Hashing 2013 VLDB 7.9799783e-05
3,021 Adaptive and Big Data Scale Parallel Execution in Oracle 2013 VLDB 7.6991391e-05
3,151 A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs 2017 SIGMOD 7.4720668e-05
3,175 Asynchronous Memory Access Chaining 2016 VLDB 7.438501e-05
3,254 Query Processing on Tensor Computation Runtimes 2022 VLDB 7.3161051e-05
3,327 Pump Up the Volume: Processing Large Data on GPUs with Fast Interconnects 2020 SIGMOD 7.2205738e-05
3,443 Distributed Join Algorithms on Thousands of Cores 2017 VLDB 7.0887214e-05
3,550 Chi: A Scalable and Programmable Control Plane for Distributed Stream Processing Systems 2018 VLDB 6.9843512e-05
3,655 CloudRAMSort: Fast and Efficient Large-Scale Distributed RAM Sort on Shared-Nothing Cluster 2012 SIGMOD 6.8718304e-05
3,721 To Partition, or Not to Partition, That is the Join Question in a Real System 2021 SIGMOD 6.8179379e-05
3,885 Density-optimized Intersection-free Mapping and Matrix Multiplication for Join-Project Operations 2022 VLDB 6.6674822e-05
3,898 Efficient Join Algorithms For Large Database Tables in a Multi-GPU Environment 2021 VLDB 6.6551268e-05
3,993 Improving Main Memory Hash Joins on Intel Xeon Phi Processors: An Experimental Approach 2015 VLDB 6.5534805e-05
4,085 In-Cache Query Co-Processing on Coupled CPU-GPU Architectures 2015 VLDB 6.4620277e-05
4,610 Deployment of Query Plans on Multicores 2015 VLDB 6.0516573e-05
4,655 SIMD- and Cache-Friendly Algorithm for Sorting an Array of Structures 2015 VLDB 6.0221672e-05
4,678 OmniDB: Towards Portable and Efficient Query Processing on Parallel CPU/GPU Architectures 2013 VLDB 6.0046271e-05
5,125 The Art of Balance: A RateupDBTM Experience of Building a CPU/GPU Hybrid Database Product 2021 VLDB 5.679423e-05
5,178 FPGA-based Data Partitioning 2017 SIGMOD 5.6438393e-05
5,247 Triton Join: Efficiently Scaling to a Large Join State on GPUs with Fast Interconnects 2022 SIGMOD 5.6057839e-05
5,293 MQJoin: Efficient Shared Execution of Main-Memory Joins 2016 VLDB 5.5815698e-05
5,376 Holistic Indexing in Main-memory Column-stores 2015 SIGMOD 5.5417421e-05
5,604 Design and Evaluation of Storage Organizations for Read-Optimized Main Memory Databases 2013 VLDB 5.4147933e-05
5,714 MCJoin: A Memory-Constrained Join for Column-Store Main-Memory Databases. 2012 SIGMOD 5.3578116e-05
5,721 FPGA-based Multithreading for In-Memory Hash Joins 2015 CIDR 5.3525009e-05
5,765 Predicate Transfer: Efficient Pre-Filtering on Multi-Join Queries 2024 CIDR 5.336442e-05
5,784 What Is the Price for Joining Securely? Benchmarking Equi-Joins in Trusted Execution Environments 2022 VLDB 5.328804e-05
6,058 ThunderRW: An In-Memory Graph Random Walk Engine 2021 VLDB 5.2310254e-05
6,114 Database Processing-in-Memory: An Experimental Study 2020 VLDB 5.204248e-05
Previous Page 1 / 2 Next

Outgoing Citations (Sorted by Pagerank)

Showing 17 of 17 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
52 Database Architecture Optimized for the new Bottleneck: Memory Access 1999 VLDB 0.00066474881
78 Multiprocessor Hash-Based Join Algorithms 1985 VLDB 0.00056413752
81 Cache Conscious Algorithms for Relational Query Processing 1994 VLDB 0.00055548574
84 AlphaSort: A RISC Machine Sort 1994 SIGMOD 0.00053866006
145 Quickly Generating Billion-Record Synthetic Databases 1994 SIGMOD 0.0004138408
233 A Study of Index Structures for Main Memory Database Management Systems 1986 VLDB 0.00032021526
239 GPUTeraSort: High Performance Graphics Co-processor Sorting for Large Database Management 2006 SIGMOD 0.00031617428
343 Implementing Database Operations Using SIMD Instructions 2002 SIGMOD 0.00026768139
714 Adaptive Aggregation on Chip Multiprocessors 2007 VLDB 0.00017730584
775 Relational Joins on Graphics Processors 2008 SIGMOD 0.00016823862
946 Efficient Implementation of Sorting on Multi-Core SIMD CPU Architecture 2008 VLDB 0.0001513324
1,079 What happens during a Join? Dissecting CPU and Memory Optimization Effects 2000 VLDB 0.00014233415
1,365 Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning 1991 VLDB 0.00012368421
1,856 An Adaptive Hash Join Algorithm for Multiuser Environments 1990 VLDB 0.00010304993
2,619 Hash-Based Join Algorithms for Multiprocessor Computers with Shared Memory 1990 VLDB 8.4431973e-05
2,763 Executing Stream Joins on the Cell Processor 2007 VLDB 8.1579306e-05
2,778 Database Servers on Chip Multiprocessors: Limitations and Opportunities 2007 CIDR 8.1321802e-05
Previous Page 1 / 1 Next

Semantically Similar Papers