Faster Set Intersection with SIMD instructions by Reducing Branch Mispredictions
Summary: Presents a SIMD-accelerated set intersection for sorted arrays by extending the merge-based approach to read multiple elements per step and compare many pairs, reducing branch mispredictions. No preprocessing; can replace std::set_intersection; up to 5.2x with SIMD (2.1x without) on 32/64-bit ints, on Xeon and POWER7+. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Hiroshi Inoue
- 2. Moriyoshi Ohara
- 3. Kenjiro Taura
Incoming Citations (Sorted by Pagerank)
Showing 9 of 9 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 342 | EmptyHeaded: A Relational Engine for Graph Processing | 2016 | SIGMOD | 0.00026795977 |
| 1,973 | Speeding Up Set Intersections in Graph Algorithms using SIMD Instructions | 2018 | SIGMOD | 9.8913631e-05 |
| 4,556 | Distributed Subgraph Matching on Timely Dataflow | 2019 | VLDB | 6.0883757e-05 |
| 4,655 | SIMD- and Cache-Friendly Algorithm for Sorting an Array of Structures | 2015 | VLDB | 6.0221672e-05 |
| 7,693 | Processing and Optimizing Main Memory Spatial-Keyword Queries | 2016 | VLDB | 4.6759281e-05 |
| 8,156 | List Intersection for Web Search: Algorithms, Cost Models, and Optimizations | 2019 | VLDB | 4.5741172e-05 |
| 8,381 | Interleaved Multi-Vectorizing | 2020 | VLDB | 4.5310603e-05 |
| 9,173 | HERO: A Hierarchical Set Partitioning and Join Framework for Speeding up the Set Intersection Over Graphs | 2024 | SIGMOD | 4.3842827e-05 |
| 11,381 | Origami: A High-Performance Mergesort Framework | 2022 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 5 of 5 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 343 | Implementing Database Operations Using SIMD Instructions | 2002 | SIGMOD | 0.00026768139 |
| 946 | Efficient Implementation of Sorting on Multi-Core SIMD CPU Architecture | 2008 | VLDB | 0.0001513324 |
| 1,124 | Improving the Performance of List Intersection | 2009 | VLDB | 0.00013847565 |
| 2,051 | Efficient Parallel Lists Intersection and Index Compression Algorithms using Graphics Processing Units | 2011 | VLDB | 9.686731e-05 |
| 2,464 | Fast Set Intersection in Memory | 2011 | VLDB | 8.7524354e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 8,381 | Interleaved Multi-Vectorizing | 2020 | VLDB | 4.5310603e-05 |
| 8,156 | List Intersection for Web Search: Algorithms, Cost Models, and Optimizations | 2019 | VLDB | 4.5741172e-05 |
| 250 | Efficient set joins on similarity predicates | 2004 | SIGMOD | 0.00030661988 |
| 958 | Rethinking SIMD Vectorization for In-Memory Databases | 2015 | SIGMOD | 0.00015045316 |
| 4,655 | SIMD- and Cache-Friendly Algorithm for Sorting an Array of Structures | 2015 | VLDB | 6.0221672e-05 |
| 946 | Efficient Implementation of Sorting on Multi-Core SIMD CPU Architecture | 2008 | VLDB | 0.0001513324 |
| 1,124 | Improving the Performance of List Intersection | 2009 | VLDB | 0.00013847565 |
| 343 | Implementing Database Operations Using SIMD Instructions | 2002 | SIGMOD | 0.00026768139 |
| 2,464 | Fast Set Intersection in Memory | 2011 | VLDB | 8.7524354e-05 |
| 1,973 | Speeding Up Set Intersections in Graph Algorithms using SIMD Instructions | 2018 | SIGMOD | 9.8913631e-05 |