Back to papers
Good to the Last Bit: Data-Driven Encoding with CodecDB
Summary: CodecDB is an encoding-aware columnar DB that tightly couples data-driven encoding selection with encoding-aware query operators to exploit encoded data. It attains ~90% encoding accuracy, up to 40% better compression, and ~10x TPC-H speedups and ~3x SSB speedups.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 6174
- Venue
- SIGMOD
- Year
- 2021
- Pagerank
- 5.0941072e-05
- Overall Rank
- 6,367 | 55.71%
- DOI
-
10.1145/3448016.3457283
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 22 of 22 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 2,381 |
TSB-UAD: An End-to-End Benchmark Suite for Univariate Time-Series Anomaly Detection |
2022 |
VLDB |
8.9327638e-05 |
| 3,416 |
LeCo: Lightweight Compression via Learning Serial Correlations |
2024 |
SIGMOD |
7.1196234e-05 |
| 3,943 |
Volume Under the Surface: A New Accuracy Evaluation Measure for Time-Series Anomaly Detection |
2022 |
VLDB |
6.6099833e-05 |
| 4,079 |
Choose Wisely: An Extensive Evaluation of Model Selection for Anomaly Detection in Time Series |
2023 |
VLDB |
6.4663636e-05 |
| 4,514 |
An Empirical Evaluation of Columnar Storage Formats |
2024 |
VLDB |
6.1204636e-05 |
| 5,562 |
A Deep Dive into Common Open Formats for Analytical DBMSs |
2023 |
VLDB |
5.4331334e-05 |
| 8,578 |
Robust and Budget-Constrained Encoding Configurations for In-Memory Database Systems |
2022 |
VLDB |
4.4923477e-05 |
| 9,294 |
Theseus: Navigating the Labyrinth of Time-Series Anomaly Detection |
2022 |
VLDB |
4.3608061e-05 |
| 9,329 |
Odyssey: An Engine Enabling The Time-Series Clustering Journey |
2023 |
VLDB |
4.3556432e-05 |
| 9,599 |
SPARTAN: Data-Adaptive Symbolic Time-Series Approximation |
2025 |
SIGMOD |
4.3177432e-05 |
| 9,645 |
The FastLanes File Format |
2025 |
VLDB |
4.3109001e-05 |
| 9,906 |
Rethinking the Encoding of Integers for Scans on Skewed Data |
2023 |
SIGMOD |
4.2578595e-05 |
| 10,281 |
GPU Acceleration of SQL Analytics on Compressed Data |
2026 |
VLDB |
4.1945683e-05 |
| 10,466 |
A Structured Study of Multivariate Time-Series Distance Measures |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,524 |
Understanding the Black Box: A Deep Empirical Dive into Shapley Value Approximations for Tabular Data |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,674 |
Improving Time Series Data Compression in Apache IoTDB |
2025 |
VLDB |
4.1945683e-05 |
| 10,738 |
TSB-AutoAD: Towards Automated Solutions for Time-Series Anomaly Detection |
2025 |
VLDB |
4.1945683e-05 |
| 10,739 |
Time-Series Clustering: A Comprehensive Study of Data Mining, Machine Learning, and Deep Learning Methods |
2025 |
VLDB |
4.1945683e-05 |
| 10,741 |
Beyond Compression: A Comprehensive Evaluation of Lossless Floating-Point Compression |
2025 |
VLDB |
4.1945683e-05 |
| 11,094 |
Time-Series Anomaly Detection: Overview and New Trends |
2024 |
VLDB |
4.1945683e-05 |
| 11,224 |
Homomorphic Compression: Making Text Processing on Compression Unlimited |
2023 |
SIGMOD |
4.1945683e-05 |
| 11,235 |
Accelerating Similarity Search for Elastic Measures: A Study and New Generalization of Lower Bounding Distances |
2023 |
VLDB |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 18 of 18 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 3 |
Pig Latin: A Not-So-Foreign Language for Data Processing |
2008 |
SIGMOD |
0.0024183614 |
| 21 |
C-Store: A Column-oriented DBMS |
2005 |
VLDB |
0.00086087497 |
| 131 |
Integrating Compression and Execution in Column-Oriented Database Systems |
2006 |
SIGMOD |
0.0004370331 |
| 305 |
SIMD-Scan: Ultra Fast in-Memory Table Scan using on-Chip Vector Processing Units |
2009 |
VLDB |
0.00028248614 |
| 476 |
Impala: A Modern, Open-Source SQL Engine for Hadoop |
2015 |
CIDR |
0.00022226941 |
| 898 |
Data Compression Support in Databases |
1994 |
VLDB |
0.00015525779 |
| 1,100 |
Query Optimization In Compressed Database Systems |
2001 |
SIGMOD |
0.00014072277 |
| 1,263 |
Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation |
2016 |
SIGMOD |
0.00012982857 |
| 1,270 |
BitWeaving: Fast Scans for Main Memory Data Processing |
2013 |
SIGMOD |
0.00012926086 |
| 2,693 |
An Architecture for Recycling Intermediates in a Column-store |
2009 |
SIGMOD |
8.2883398e-05 |
| 2,856 |
Efficient Index Compression in DB2 LUW |
2009 |
VLDB |
8.0056412e-05 |
| 4,602 |
Accelerating Raw Data Analysis with the ACCORDA Software and Hardware Architecture |
2019 |
VLDB |
6.0567387e-05 |
| 5,236 |
Online Deduplication for Databases |
2017 |
SIGMOD |
5.611324e-05 |
| 5,835 |
Order-Preserving Key Compression for In-Memory Search Trees |
2020 |
SIGMOD |
5.30905e-05 |
| 6,157 |
Compression Aware Physical Database Design |
2011 |
VLDB |
5.1801143e-05 |
| 6,311 |
VergeDB: A Database for IoT Analytics on Edge Devices |
2021 |
CIDR |
5.1161316e-05 |
| 7,335 |
MorphStore: Analytical Query Engine with a Holistic Compression-Enabled Processing Model |
2020 |
VLDB |
4.7603723e-05 |
| 8,088 |
PIDS: Attribute Decomposition for Improved Compression and Query Performance in Columnar Storage |
2020 |
VLDB |
4.5897316e-05 |
Semantically Similar Papers