High-Ratio Compression for Machine-Generated Data

Summary: Pattern-Based Compression (PBC) exploits patterns in machine-generated data to achieve Pareto-optimal compression. Per-record encoding enables fast random access, delivering ~2× ratios over state-of-the-art with production-ready deployment in DB systems. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID: 6748
Venue: SIGMOD
Year: 2023
Pagerank: 4.3153078e-05
Overall Rank: 9,595 | 33.32%
DOI: 10.1145/3626732

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 3 of 3 citing papers.

Rank	Citing Paper	Year	Venue	Pagerank
9,646	The FastLanes File Format	2025	VLDB	4.3067693e-05
10,303	Morphing-based Compression for Data-centric ML Pipelines	2026	VLDB	4.1905499e-05
10,708	LogLite: Lightweight Plug-and-Play Streaming Log Compression	2025	VLDB	4.1905499e-05

Outgoing Citations (Sorted by Pagerank)

Showing 13 of 13 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank	Cited Paper	Year	Venue	Pagerank
1,085	HOT: A Height Optimized Trie Index for Main-Memory Database Systems	2018	SIGMOD	0.00014173956
1,134	Dictionary-based Order-preserving String Compression for Main Memory Column Stores	2009	SIGMOD	0.00013751593
1,169	SuRF: Practical Range Query Filtering with Fast Succinct Tries	2018	SIGMOD	0.00013530267
1,599	Semantic Compression and Pattern Extraction with Fascicles	1999	VLDB	0.00011203327
2,137	How to Wring a Table Dry: Entropy Compression of Relations and Querying of Compressed Relations	2006	VLDB	9.463768e-05
2,906	SPARTAN: A Model-Based Semantic Compression System for Massive Data Tables	2001	SIGMOD	7.9324961e-05
3,741	DeepSqueeze: Deep Semantic Compression for Tabular Data	2020	SIGMOD	6.7952067e-05
3,757	White-box Compression: Learning and Exploiting Compact Table Representations	2020	CIDR	6.7804933e-05
4,702	JSON Tiles: Fast Analytics on Semi-Structured Data	2021	SIGMOD	5.9796907e-05
5,847	Order-Preserving Key Compression for In-Memory Search Trees	2020	SIGMOD	5.3040014e-05
6,311	VergeDB: A Database for IoT Analytics on Edge Devices	2021	CIDR	5.1112212e-05
7,432	Adaptive Log Compression for Massive Log Data	2013	SIGMOD	4.7272331e-05
8,090	PIDS: Attribute Decomposition for Improved Compression and Query Performance in Columnar Storage	2020	VLDB	4.5853298e-05

Semantically Similar Papers

Overall Rank	Paper	Year	Venue	Pagerank
3,501	A New Compression Method with Fast Searching on Large Databases	1987	VLDB	7.033534e-05
689	Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences	1997	SIGMOD	0.00018069077
6,160	Compression Aware Physical Database Design	2011	VLDB	5.1750659e-05
132	Integrating Compression and Execution in Column-Oriented Database Systems	2006	SIGMOD	0.00043697853
8,575	Robust and Budget-Constrained Encoding Configurations for In-Memory Database Systems	2022	VLDB	4.4880409e-05
9,410	Revisiting B-tree Compression: An Experimental Study	2024	SIGMOD	4.3399748e-05
4,469	Comprehensive and Efficient Workload Compression	2021	VLDB	6.1535623e-05
9,665	Fingerprints for Compressed Columnar Data Search	2019	SIGMOD	4.3041238e-05
1,098	Query Optimization In Compressed Database Systems	2001	SIGMOD	0.00014070252
7,431	CompressDB: Enabling Efficient Compressed Data Direct Processing for Various Databases	2022	SIGMOD	4.7274757e-05