Database Paper Browser

Back to papers

Efficient Implementation of Sorting on Multi-Core SIMD CPU Architecture

Summary: Architecture-aware SIMD MergeSort on multi-core CPUs; 128-bit SSE yields 3.3× speedup with an efficient multiway merge, not memory-bandwidth bound. Sorts 64M FP numbers in <0.5s on 4-core CPUs; demonstrates SIMD-width and core-count scalability, with cycle-accurate Larrabee validation. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
9756
Venue
VLDB
Year
2008
Pagerank
0.0001513324
Overall Rank
946 | 93.43%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 38 of 38 citing papers.

Rank Citing Paper Year Venue Pagerank
351 Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs 2009 VLDB 0.0002636504
381 FAST: Fast Architecture Sensitive Tree Search on Modern CPUs and GPUs 2010 SIGMOD 0.00024873637
404 Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited 2014 VLDB 0.00024143076
585 Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems 2012 VLDB 0.00019706145
930 Fast Sort on CPUs and GPUs: A Case for Bandwidth Oblivious SIMD Sort 2010 SIGMOD 0.00015238545
950 Data Processing on FPGAs 2009 VLDB 0.00015108484
958 Rethinking SIMD Vectorization for In-Memory Databases 2015 SIGMOD 0.00015045316
1,263 Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation 2016 SIGMOD 0.00012982857
1,269 Cache locality is not enough: High-Performance Nearest Neighbor Search with Product Quantization Fast Scan 2016 VLDB 0.00012930432
1,607 A Comprehensive Study of Main-Memory Partitioning and its Application to Large-Scale Comparison- and Radix-Sort 2014 SIGMOD 0.00011162682
1,731 Fast Updates on Read-Optimized Databases Using Multi-Core CPUs 2012 VLDB 0.0001073454
2,006 PALM: Parallel Architecture-Friendly Latch-Free Modifications to B+ Trees on Many-Core Processors 2011 VLDB 9.8101551e-05
2,390 ByteSlice: Pushing the Envelop of Main Memory Data Processing with a New Storage Layout 2015 SIGMOD 8.9084657e-05
3,151 A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs 2017 SIGMOD 7.4720668e-05
3,448 Faster Set Intersection with SIMD instructions by Reducing Branch Mispredictions 2015 VLDB 7.0844401e-05
3,655 CloudRAMSort: Fast and Efficient Large-Scale Distributed RAM Sort on Shared-Nothing Cluster 2012 SIGMOD 6.8718304e-05
4,042 PARADIS: An Efficient Parallel Algorithm for In-place Radix Sort 2015 VLDB 6.5026989e-05
4,655 SIMD- and Cache-Friendly Algorithm for Sorting an Array of Structures 2015 VLDB 6.0221672e-05
5,125 The Art of Balance: A RateupDBTM Experience of Building a CPU/GPU Hybrid Database Product 2021 VLDB 5.679423e-05
5,784 What Is the Price for Joining Securely? Benchmarking Equi-Joins in Trusted Execution Environments 2022 VLDB 5.328804e-05
6,041 FPGA: What's in it for a Database? 2009 SIGMOD 5.2407055e-05
6,114 Database Processing-in-Memory: An Experimental Study 2020 VLDB 5.204248e-05
6,434 Patience is a Virtue: Revisiting Merge and Sort on Modern Processors 2014 SIGMOD 5.0640194e-05
6,540 Data Partitioning for In-Memory Systems: Myths, Challenges, and Opportunities 2019 CIDR 5.0219214e-05
7,097 Fast Multi-Column Sorting in Main-Memory Column-Stores 2016 SIGMOD 4.8336115e-05
7,335 MorphStore: Analytical Query Engine with a Holistic Compression-Enabled Processing Model 2020 VLDB 4.7603723e-05
7,551 Efficient Top-K Query Processing on Massively Parallel Hardware 2018 SIGMOD 4.7134746e-05
8,018 Parallelizing Intra-Window Join on Multicores: An Experimental Study 2021 SIGMOD 4.6046381e-05
8,051 Building Advanced SQL Analytics From Low-Level Plan Operators 2021 SIGMOD 4.5969549e-05
8,381 Interleaved Multi-Vectorizing 2020 VLDB 4.5310603e-05
8,626 Adaptive Code Generation for Data-Intensive Analytics 2021 VLDB 4.4829152e-05
8,702 Efficient Evaluation of Arbitrarily-Framed Holistic SQL Aggregates and Window Functions 2022 SIGMOD 4.4650384e-05
8,927 An Application-Specific Instruction Set for Accelerating Set-Oriented Database Primitives 2014 SIGMOD 4.427232e-05
9,838 Efficiently Joining Large Relations on Multi-GPU Systems 2025 VLDB 4.2740344e-05
10,121 TQEx: Tensor-based Query Engine Enhanced by Bridging the Gap 2026 SIGMOD 4.1945683e-05
10,981 Enabling Adaptive Sampling for Intra-Window Join: Simultaneously Optimizing Quantity and Quality 2024 SIGMOD 4.1945683e-05
11,381 Origami: A High-Performance Mergesort Framework 2022 VLDB 4.1945683e-05
11,843 Efficient Query Processing on Many-core Architectures: A Case Study with Intel Xeon Phi Processor 2016 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 2 of 2 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
239 GPUTeraSort: High Performance Graphics Co-processor Sorting for Large Database Management 2006 SIGMOD 0.00031617428
1,760 CellSort: High Performance Sorting on the Cell Processor 2007 VLDB 0.00010651836
Previous Page 1 / 1 Next

Semantically Similar Papers