Bullion: A Column Store for Machine Learning
Summary: Bullion is a columnar store tailored for ML: modular cascading encodings, in‑storage feature quantization, optimized long-sequence sparse‑feature encodings, and quality‑aware sequential reads for wide-table/multimodal training. Yields lower I/O for deletion compliance, substantial storage savings on sparse features, and faster metadata parsing than existing column stores. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Gang Liao
- 2. Ye Liu
- 3. Jianjun Chen
- 4. Daniel J. Abadi
Incoming Citations (Sorted by Pagerank)
Showing 4 of 4 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 9,201 | F3: The Open-Source Data File Format for the Future | 2026 | SIGMOD | 4.3743539e-05 |
| 9,645 | The FastLanes File Format | 2025 | VLDB | 4.3109001e-05 |
| 9,701 | Towards Functional Decomposition of Storage Formats | 2025 | CIDR | 4.3008468e-05 |
| 10,777 | Magnus: A Holistic Approach to Data Management for Large-Scale Machine Learning Workloads | 2025 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 27 of 27 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 10,777 | Magnus: A Holistic Approach to Data Management for Large-Scale Machine Learning Workloads | 2025 | VLDB | 4.1945683e-05 |
| 2,613 | Decomposed Bounded Floats for Fast Compression and Queries | 2021 | VLDB | 8.4503824e-05 |
| 1,590 | Column-oriented Database Systems | 2009 | VLDB | 0.00011233838 |
| 10,220 | FlatStor: An Efficient Embedded-Index Based Columnar Data Layout for Multimodal Data Workloads | 2026 | VLDB | 4.1945683e-05 |
| 9,236 | The Hopsworks Feature Store for Machine Learning | 2024 | SIGMOD | 4.3690661e-05 |
| 4,003 | Data Platform for Machine Learning | 2019 | SIGMOD | 6.54347e-05 |
| 6,336 | Column Stores For Wide and Sparse Data | 2007 | CIDR | 5.1056582e-05 |
| 9,690 | Frequency-Store: Scaling Image AI by A Column-Store for Images | 2025 | CIDR | 4.3037432e-05 |
| 4,514 | An Empirical Evaluation of Columnar Storage Formats | 2024 | VLDB | 6.1204636e-05 |
| 6,404 | ColumnML: Column-Store Machine Learning with On-The-Fly Data Transformation | 2019 | VLDB | 5.0786954e-05 |