DiskJoin: Large-scale Vector Similarity Join with SSD
Summary: DiskJoin: first disk-based similarity join for billion-scale vectors on one machine, leveraging NVMe SSDs to avoid costly cluster communication. It minimizes read amplification via SSD-aware access, uses dynamic cache+eviction policies, and probabilistic pruning to achieve 50×–1000× speedups. (summarized by gpt-5-mini on Feb 11 2026)
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Authors
- 1. Yanqi Chen
- 2. Xiao Yan
- 3. Alexandra Meliou
- 4. Eric Lo
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 17 of 17 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,220 | Similarity Join Size Estimation using Locality Sensitive Hashing | 2011 | VLDB | 5.6216111e-05 |
| 10,706 | Extensible and Robust Evaluation of Similarity Queries | 2025 | VLDB | 4.1945683e-05 |
| 250 | Efficient set joins on similarity predicates | 2004 | SIGMOD | 0.00030661988 |
| 9,143 | Similarity Query Processing Using Disk Arrays | 1998 | SIGMOD | 4.3850454e-05 |
| 7,765 | Cache-oblivious High-performance Similarity Join | 2019 | SIGMOD | 4.6572085e-05 |
| 13,473 | Exploiting Database Similarity Joins for Metric Spaces | 2012 | VLDB | - |
| 6,507 | Similarity Join over Array Data | 2016 | SIGMOD | 5.0337166e-05 |
| 10,930 | Similarity Joins of Sparse Features | 2024 | SIGMOD | 4.1945683e-05 |
| 3,141 | ClusterJoin: A Similarity Joins Framework using Map-Reduce | 2014 | VLDB | 7.4829448e-05 |
| 8,899 | Fast Approximate Similarity Join in Vector Databases | 2025 | SIGMOD | 4.427232e-05 |