An Empirical Evaluation of Columnar Storage Formats
Summary: Revisits Parquet and ORC with a stress-test benchmark on modern hardware and workloads, pinpointing internal choices that favor today's analytics: default dictionary encoding, integer encodings optimized for decode speed, optional block compression, and finer-grained auxiliary structures. Shows format inefficiencies for common ML workflows and GPU decoding, and derives concrete guidelines for next-generation columnar formats. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Xinyu Zeng
- 2. Yulong Hui
- 3. Jiahong Shen
- 4. Andrew Pavlo
- 5. Wes McKinney
- 6. Huanchen Zhang
Incoming Citations (Sorted by Pagerank)
Showing 20 of 20 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 37 of 37 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 9,689 | LST-Bench: Benchmarking Log-Structured Tables in the Cloud | 2024 | SIGMOD | 4.3043822e-05 |
| 9,645 | The FastLanes File Format | 2025 | VLDB | 4.3109001e-05 |
| 3,644 | BtrBlocks: Efficient Columnar Compression for Data Lakes | 2023 | SIGMOD | 6.8854928e-05 |
| 9,201 | F3: The Open-Source Data File Format for the Future | 2026 | SIGMOD | 4.3743539e-05 |
| 6,802 | Understanding Insights into the Basic Structure and Essential Issues of Table Placement Methods in Clusters | 2013 | VLDB | 4.9226626e-05 |
| 6,666 | Mainlining Databases: Supporting Fast Transactional Workloads on Universal Columnar Data File Formats | 2021 | VLDB | 4.9691571e-05 |
| 7,427 | Selection Pushdown in Column Stores using Bit Manipulation Instructions | 2023 | SIGMOD | 4.7327406e-05 |
| 9,701 | Towards Functional Decomposition of Storage Formats | 2025 | CIDR | 4.3008468e-05 |
| 3,208 | Column-Oriented Storage Techniques for MapReduce | 2011 | VLDB | 7.3781897e-05 |
| 5,562 | A Deep Dive into Common Open Formats for Analytical DBMSs | 2023 | VLDB | 5.4331334e-05 |