Database Paper Browser

Back to papers

Bullion: A Column Store for Machine Learning

Summary: Bullion is a columnar store tailored for ML: modular cascading encodings, in‑storage feature quantization, optimized long-sequence sparse‑feature encodings, and quality‑aware sequential reads for wide-table/multimodal training. Yields lower I/O for deletion compliance, substantial storage savings on sparse features, and faster metadata parsing than existing column stores. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID
553
Venue
CIDR
Year
2025
Pagerank
4.7204398e-05
Overall Rank
7,469 | 48.05%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 4 of 4 citing papers.

Rank Citing Paper Year Venue Pagerank
9,201 F3: The Open-Source Data File Format for the Future 2026 SIGMOD 4.3743539e-05
9,645 The FastLanes File Format 2025 VLDB 4.3109001e-05
9,701 Towards Functional Decomposition of Storage Formats 2025 CIDR 4.3008468e-05
10,777 Magnus: A Holistic Approach to Data Management for Large-Scale Machine Learning Workloads 2025 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 27 of 27 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
21 C-Store: A Column-oriented DBMS 2005 VLDB 0.00086087497
70 Hive - A Warehousing Solution Over a Map-Reduce Framework 2009 VLDB 0.00059533166
80 Weaving Relations for Cache Performance 2001 VLDB 0.00055721729
131 Integrating Compression and Execution in Column-Oriented Database Systems 2006 SIGMOD 0.0004370331
167 The Snowflake Elastic Data Warehouse 2016 SIGMOD 0.00039180521
210 Gorilla: A Fast, Scalable, In-Memory Time Series Database 2015 VLDB 0.0003404384
310 The Vertica Analytic Database: C-Store 7 Years Later 2012 VLDB 0.00028132402
426 Amazon Redshift and the Case for Simpler Data Warehouses 2015 SIGMOD 0.00023594359
476 Impala: A Modern, Open-Source SQL Engine for Hadoop 2015 CIDR 0.00022226941
497 Column-Stores vs. Row-Stores: How Different Are They Really? 2008 SIGMOD 0.00021716559
746 Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores 2020 VLDB 0.00017326979
1,284 Amazon Redshift Re-invented 2022 SIGMOD 0.00012837822
1,377 Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics 2021 CIDR 0.00012296941
1,590 Column-oriented Database Systems 2009 VLDB 0.00011233838
1,943 Procella: Unifying serving and analytical data at YouTube 2019 VLDB 0.00010012569
2,064 Chimp: Efficient Lossless Floating Point Compression for Time Series Databases 2022 VLDB 9.6418929e-05
2,170 tf.data: A Machine Learning Data Processing Framework 2021 VLDB 9.3821603e-05
2,998 Major Technical Advancements in Apache Hive 2014 SIGMOD 7.753765e-05
3,644 BtrBlocks: Efficient Columnar Compression for Data Lakes 2023 SIGMOD 6.8854928e-05
3,844 The evolution of Amazon Redshift (extended abstract) 2021 VLDB 6.7076451e-05
4,055 Online, Asynchronous Schema Change in F1 2013 VLDB 6.4910596e-05
4,507 ALP: Adaptive Lossless floating-Point Compression 2023 SIGMOD 6.131017e-05
4,514 An Empirical Evaluation of Columnar Storage Formats 2024 VLDB 6.1204636e-05
5,865 ByteHTAP: ByteDance’s HTAP System with High Data Freshness and Strong Data Consistency 2022 VLDB 5.296893e-05
6,336 Column Stores For Wide and Sparse Data 2007 CIDR 5.1056582e-05
6,715 Shared Foundations: Modernizing Meta's Data Lakehouse 2023 CIDR 4.9509939e-05
7,886 BullFrog: Online Schema Evolution via Lazy Evaluation 2021 SIGMOD 4.6263924e-05
Previous Page 1 / 1 Next

Semantically Similar Papers