Database Paper Browser

Back to papers

Partition, Don’t Sort! Compression Boosters for Cloud Data Ingestion Pipelines

Summary: Rather than expensive global sorting, cluster similarly structured nested Dremel-encoded records at ingestion to create compressible partitions. A decision-tree–inspired clustering is up to 17.44× faster than partition-then-sort and yields up to 2× compression, while per-bucket sorting matches increasing-cardinality compression at lower ingestion cost. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID
13556
Venue
VLDB
Year
2024
Pagerank
4.1945683e-05
Overall Rank
11,067 | 23.01%
DOI
10.14778/3681954.3682013

Incoming Non-self Citations Over Time

No non-self incoming citations found for this paper in this database.

Authors

Incoming Citations (Sorted by Pagerank)

Showing 0 of 0 citing papers.

Rank Citing Paper Year Venue Pagerank
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 23 of 23 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
34 Similarity Search in High Dimensions via Hashing 1999 VLDB 0.00076637636
109 Dremel: Interactive Analysis of Web-Scale Datasets 2010 VLDB 0.00048186983
131 Integrating Compression and Execution in Column-Oriented Database Systems 2006 SIGMOD 0.0004370331
290 Linear Clustering of Objects with Multiple Attributes 1990 SIGMOD 0.00028919734
408 Database Cracking 2007 CIDR 0.00023953844
659 The Making of TPC-DS 2006 VLDB 0.00018500853
746 Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores 2020 VLDB 0.00017326979
1,111 Sybase IQ Multiplex – Designed For Analytics 2004 VLDB 0.00013936696
2,062 Dremel: A Decade of Interactive SQL Analysis at Web Scale 2020 VLDB 9.6481955e-05
2,681 NET-FLi: On-the-fly Compression, Archiving and Indexing of Streaming Network Traffic 2010 VLDB 8.3232427e-05
3,737 Skipping-oriented Partitioning for Columnar Layouts 2017 VLDB 6.8033227e-05
3,779 Instance-Optimized Data Layouts for Cloud Analytics Workloads 2021 SIGMOD 6.7747205e-05
4,704 JSON Tiles: Fast Analytics on Semi-Structured Data 2021 SIGMOD 5.9853687e-05
5,898 Column Partition and Permutation for Run Length Encoding in Columnar Databases 2020 SIGMOD 5.2839046e-05
6,343 Rearranging Data to Maximize the Efficiency of Compression 1986 PODS 5.1026755e-05
6,466 Pando: Enhanced Data Skipping with Logical Data Partitioning 2023 VLDB 5.0528281e-05
6,674 Exploiting Common Patterns for Tree-Structured Data 2017 SIGMOD 4.9663344e-05
6,802 Understanding Insights into the Basic Structure and Essential Issues of Table Placement Methods in Clusters 2013 VLDB 4.9226626e-05
6,803 Proteus: Autonomous Adaptive Storage for Mixed Workloads 2022 SIGMOD 4.9224958e-05
7,112 Wide Table Layout Optimization based on Column Ordering and Duplication 2017 SIGMOD 4.8275068e-05
7,128 Jigsaw: A Data Storage and Query Processing Engine for Irregular Table Partitioning 2021 SIGMOD 4.8230171e-05
7,571 Reducing Ambiguity in Json Schema Discovery 2021 SIGMOD 4.7075853e-05
8,225 Automated Multidimensional Data Layouts in Amazon Redshift 2024 SIGMOD 4.555289e-05
Previous Page 1 / 1 Next

Semantically Similar Papers