Beyond Compression: A Comprehensive Evaluation of Lossless Floating-Point Compression

Summary: Comprehensive empirical evaluation of lossless floating-point compressors for columnar engines, measuring compression ratio, in-situ DB query performance, and ML queries (distance, k-NN/RAG); implemented in Rust and released as a library. Finds clear trade-offs: no single method dominates both space and query speed—some compressors yield higher compression but slower DB/ML query performance. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID: 14055
Venue: VLDB
Year: 2025
Pagerank: 4.1905499e-05
Overall Rank: 10,748 | 25.31%
DOI: 10.14778/3749646.3749701

Incoming Non-self Citations Over Time

No non-self incoming citations found for this paper in this database.

Authors

Incoming Citations (Sorted by Pagerank)

Showing 1 of 1 citing papers.

Rank	Citing Paper	Year	Venue	Pagerank
10,746	Time-Series Clustering: A Comprehensive Study of Data Mining, Machine Learning, and Deep Learning Methods	2025	VLDB	4.1905499e-05

Outgoing Citations (Sorted by Pagerank)

Showing 36 of 36 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank	Cited Paper	Year	Venue	Pagerank
132	Integrating Compression and Execution in Column-Oriented Database Systems	2006	SIGMOD	0.00043697853
211	Gorilla: A Fast, Scalable, In-Memory Time Series Database	2015	VLDB	0.0003401421
1,267	BitWeaving: Fast Scans for Main Memory Data Processing	2013	SIGMOD	0.00012917585
1,510	k-Shape: Efficient and Accurate Clustering of Time Series	2015	SIGMOD	0.00011588558
1,590	Column-oriented Database Systems	2009	VLDB	0.00011230525
2,032	SAND: Streaming Subsequence Anomaly Detection	2021	VLDB	9.7320795e-05
2,070	Chimp: Efficient Lossless Floating Point Compression for Time Series Databases	2022	VLDB	9.6331032e-05
2,381	TSB-UAD: An End-to-End Benchmark Suite for Univariate Time-Series Anomaly Detection	2022	VLDB	8.9241557e-05
2,390	ByteSlice: Pushing the Envelop of Main Memory Data Processing with a New Storage Layout	2015	SIGMOD	8.9006978e-05
2,619	Decomposed Bounded Floats for Fast Compression and Queries	2021	VLDB	8.4427442e-05
3,642	BtrBlocks: Efficient Columnar Compression for Data Lakes	2023	SIGMOD	6.8876984e-05
3,946	Volume Under the Surface: A New Accuracy Evaluation Measure for Time-Series Anomaly Detection	2022	VLDB	6.6036232e-05
4,062	GRAIL: Efficient Time-Series Representation Learning	2019	VLDB	6.4792249e-05
4,082	Choose Wisely: An Extensive Evaluation of Model Selection for Anomaly Detection in Time Series	2023	VLDB	6.4601453e-05
4,389	Elf: Erasing-based Lossless Floating-Point Compression	2023	VLDB	6.2197225e-05
4,509	ALP: Adaptive Lossless floating-Point Compression	2023	SIGMOD	6.1251244e-05
4,520	The FastLanes Compression Layout: Decoding >100 Billion Integers per Second with Scalar Code	2023	VLDB	6.1119645e-05
4,854	Debunking Four Long-Standing Misconceptions of Time-Series Distance Measures	2020	SIGMOD	5.8707943e-05
5,039	Tile-based Lightweight Integer Compression in GPU	2022	SIGMOD	5.7369993e-05
5,571	A Deep Dive into Common Open Formats for Analytical DBMSs	2023	VLDB	5.4279553e-05
6,311	VergeDB: A Database for IoT Analytics on Edge Devices	2021	CIDR	5.1112212e-05
6,366	Good to the Last Bit: Data-Driven Encoding with CodecDB	2021	SIGMOD	5.0892171e-05
7,392	MOST: Model-Based Compression with Outlier Storage for Time Series Data	2023	SIGMOD	4.737456e-05
7,431	CompressDB: Enabling Efficient Compressed Data Direct Processing for Various Databases	2022	SIGMOD	4.7274757e-05
8,090	PIDS: Attribute Decomposition for Improved Compression and Query Performance in Columnar Storage	2020	VLDB	4.5853298e-05
8,584	FCBench: Cross-Domain Benchmarking of Lossless Compression for Floating-Point Data	2024	VLDB	4.4857511e-05
9,299	Theseus: Navigating the Labyrinth of Time-Series Anomaly Detection	2022	VLDB	4.356626e-05
9,334	Odyssey: An Engine Enabling The Time-Series Clustering Journey	2023	VLDB	4.351469e-05
9,599	SPARTAN: Data-Adaptive Symbolic Time-Series Approximation	2025	SIGMOD	4.3136057e-05
10,476	A Structured Study of Multivariate Time-Series Distance Measures	2025	SIGMOD	4.1905499e-05
10,725	BURST: Rendering Clustering Techniques Suitable for Evolving Streams	2025	VLDB	4.1905499e-05
10,745	TSB-AutoAD: Towards Automated Solutions for Time-Series Anomaly Detection	2025	VLDB	4.1905499e-05
10,746	Time-Series Clustering: A Comprehensive Study of Data Mining, Machine Learning, and Deep Learning Methods	2025	VLDB	4.1905499e-05
11,097	Time-Series Anomaly Detection: Overview and New Trends	2024	VLDB	4.1905499e-05
11,237	Accelerating Similarity Search for Elastic Measures: A Study and New Generalization of Lower Bounding Distances	2023	VLDB	4.1905499e-05
13,274	SAND in Action: Subsequence Anomaly Detection for Streams	2021	VLDB	-

Semantically Similar Papers

Overall Rank	Paper	Year	Venue	Pagerank
10,175	Improving LZ4 for Effective Compression and Efficient Query	2026	SIGMOD	4.1905499e-05
3,541	Similarity search in the blink of an eye with compressed indices	2023	VLDB	6.9910982e-05
1,098	Query Optimization In Compressed Database Systems	2001	SIGMOD	0.00014070252
9,410	Revisiting B-tree Compression: An Experimental Study	2024	SIGMOD	4.3399748e-05
3,501	A New Compression Method with Fast Searching on Large Databases	1987	VLDB	7.033534e-05
7,431	CompressDB: Enabling Efficient Compressed Data Direct Processing for Various Databases	2022	SIGMOD	4.7274757e-05
9,414	Experimental Analysis of Large-scale Learnable Vector Storage Compression	2024	VLDB	4.3399748e-05
2,619	Decomposed Bounded Floats for Fast Compression and Queries	2021	VLDB	8.4427442e-05
8,584	FCBench: Cross-Domain Benchmarking of Lossless Compression for Floating-Point Data	2024	VLDB	4.4857511e-05
11,578	An Evaluation of Methods of Compressing Doubles	2020	SIGMOD	4.1905499e-05