Rearranging Data to Maximize the Efficiency of Compression
Summary: Permute categories per attribute to maximize compression of multi-attribute categorical data; deterministic RLE category-rearrangement is NP‑complete via a reduction from rectilinear TSP. Under a probabilistic model the optimal order is a “double pipe organ”, with an O(n^2) algorithm for k‑dimensional data (fixed k). (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Frank Olken
- 2. Doron Rotem
Incoming Citations (Sorted by Pagerank)
Showing 2 of 2 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,598 | Semantic Compression and Pattern Extraction with Fascicles | 1999 | VLDB | 0.00011202905 |
| 11,067 | Partition, Don’t Sort! Compression Boosters for Cloud Data Ingestion Pipelines | 2024 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 0 of 0 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,134 | Dictionary-based Order-preserving String Compression for Main Memory Column Stores | 2009 | SIGMOD | 0.00013761456 |
| 131 | Integrating Compression and Execution in Column-Oriented Database Systems | 2006 | SIGMOD | 0.0004370331 |
| 9,595 | High-Ratio Compression for Machine-Generated Data | 2023 | SIGMOD | 4.3194469e-05 |
| 1,443 | Compressing SQL Workloads | 2002 | SIGMOD | 0.00011947004 |
| 4,468 | Comprehensive and Efficient Workload Compression | 2021 | VLDB | 6.1584035e-05 |
| 8,578 | Robust and Budget-Constrained Encoding Configurations for In-Memory Database Systems | 2022 | VLDB | 4.4923477e-05 |
| 3,916 | Compressing Large Boolean Matrices Using Reordering Techniques | 2004 | VLDB | 6.6328898e-05 |
| 1,100 | Query Optimization In Compressed Database Systems | 2001 | SIGMOD | 0.00014072277 |
| 13,004 | Transposition Algorithms on Very Large Compressed Databases | 1986 | VLDB | 4.1945683e-05 |
| 5,898 | Column Partition and Permutation for Run Length Encoding in Columnar Databases | 2020 | SIGMOD | 5.2839046e-05 |