Database Paper Browser

Back to papers

Fine-grained Partitioning for Aggressive Data Skipping

Summary: Fine-grained blocking with per-block skip metadata enables aggressive data skipping. Tuples map to feature vectors from frequent-itemset filters; Balanced MaxSkip Partitioning is NP-hard and solved by bottom-up clustering, yielding 2–5x speedups over range-based blocking in Shark on TPC-H and real workloads. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
4886
Venue
SIGMOD
Year
2014
Pagerank
0.00011770865
Overall Rank
1,477 | 89.73%
DOI
10.1145/2588555.2610515

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 41 of 41 citing papers.

Rank Citing Paper Year Venue Pagerank
167 The Snowflake Elastic Data Warehouse 2016 SIGMOD 0.00039180521
1,611 Qd-tree: Learning Data Layouts for Big Data Analytics 2020 SIGMOD 0.00011147324
2,083 Towards a Learning Optimizer for Shared Clouds 2019 VLDB 9.5834572e-05
2,320 High-Throughput Vector Similarity Search in Knowledge Graphs 2023 SIGMOD 9.0366225e-05
3,488 Optimal Column Layout for Hybrid Workloads 2019 VLDB 7.0479329e-05
3,608 Column Sketches: A Scan Accelerator for Rapid and Robust Predicate Evaluation 2018 SIGMOD 6.924272e-05
3,737 Skipping-oriented Partitioning for Columnar Layouts 2017 VLDB 6.8033227e-05
3,779 Instance-Optimized Data Layouts for Cloud Analytics Workloads 2021 SIGMOD 6.7747205e-05
3,891 Slalom: Coasting Through Raw Data via Adaptive Partitioning and Indexing 2017 VLDB 6.659442e-05
3,922 Pushing Data-Induced Predicates Through Joins in Big-Data Clusters 2020 VLDB 6.6291079e-05
4,161 Access Path Selection in Main-Memory Optimized Data Systems: Should I Scan or Should I Probe? 2017 SIGMOD 6.3938006e-05
5,118 AdaptDB: Adaptive Partitioning for Distributed Joins 2017 VLDB 5.6820984e-05
5,119 Design Tradeoffs of Data Access Methods 2016 SIGMOD 5.6807904e-05
5,532 A Padded Encoding Scheme to Accelerate Scans by Leveraging Skew 2015 SIGMOD 5.4548897e-05
5,749 BinDex: A Two-Layered Index for Fast and Robust Scans 2020 SIGMOD 5.3418923e-05
6,149 Crystal: A Unified Cache Storage System for Analytical Databases 2021 VLDB 5.1847534e-05
6,398 Endure: A Robust Tuning Paradigm for LSM Trees Under Workload Uncertainty 2022 VLDB 5.0819209e-05
6,466 Pando: Enhanced Data Skipping with Logical Data Partitioning 2023 VLDB 5.0528281e-05
6,740 Combining Aggregation and Sampling (Nearly) Optimally for Approximate Query Processing 2021 SIGMOD 4.944395e-05
6,947 QUILTS: Multidimensional Partitioning Framework Based on Query-Aware and Skew-Tolerant Space-Filling Curves 2017 SIGMOD 4.8909129e-05
6,972 Predicate Caching: Query-Driven Secondary Indexing for Cloud Data Warehouses 2024 SIGMOD 4.8785237e-05
6,984 Replicated Layout for In-Memory Database Systems 2022 VLDB 4.873081e-05
7,053 Statisticum: Data Statistics Management in SAP HANA 2017 VLDB 4.8497195e-05
7,112 Wide Table Layout Optimization based on Column Ordering and Duplication 2017 SIGMOD 4.8275068e-05
7,128 Jigsaw: A Data Storage and Query Processing Engine for Irregular Table Partitioning 2021 SIGMOD 4.8230171e-05
7,483 RTScan: Efficient Scan with Ray Tracing Cores 2024 VLDB 4.7180617e-05
7,663 Optimizing Collections of Bloom Filters within a Space Budget 2024 VLDB 4.6857816e-05
8,103 Grep: A Graph Learning Based Database Partitioning System 2023 SIGMOD 4.5852201e-05
8,405 Towards Designing and Learning Piecewise Space-Filling Curves 2023 VLDB 4.5224126e-05
8,415 Pruning in Snowflake: Working Smarter, Not Harder 2025 SIGMOD 4.5197687e-05
8,447 Cabin: a Compressed Adaptive Binned Scan Index 2024 SIGMOD 4.5102052e-05
8,596 Prompt: Dynamic Data-Partitioning for Distributed Micro-batch Stream Processing Systems 2020 SIGMOD 4.4887993e-05
8,636 WISK: A Workload-aware Learned Index for Spatial Keyword Queries 2023 SIGMOD 4.4801284e-05
9,801 Amoeba: A Shape changing Storage System for Big Data 2016 VLDB 4.2815507e-05
10,179 LiveBin: A Localized and Version-Aware Binned Scan Index 2026 SIGMOD 4.1945683e-05
10,385 Optimizing Block Skipping for High-Dimensional Data with Learned Adaptive Curve 2025 SIGMOD 4.1945683e-05
10,404 Dynamic Pruning for Recursive Joins 2025 SIGMOD 4.1945683e-05
10,761 SIEVE: Effective Filtered Vector Search with Collection of Indexes 2025 VLDB 4.1945683e-05
11,212 SH2O: Efficient Data Access for Work-Sharing Databases 2023 SIGMOD 4.1945683e-05
11,572 Workload-Aware Column Imprints 2020 SIGMOD 4.1945683e-05
11,993 A Partitioning Framework for Aggressive Data Skipping 2014 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 19 of 19 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
11 Implementing Data Cubes Efficiently 1996 SIGMOD 0.0011708144
33 BIRCH: An Efficient Data Clustering Method for Very Large Databases 1996 SIGMOD 0.00077324389
70 Hive - A Warehousing Solution Over a Map-Reduce Framework 2009 VLDB 0.00059533166
109 Dremel: Interactive Analysis of Web-Scale Datasets 2010 VLDB 0.00048186983
158 Automated Selection of Materialized Views and Indexes for SQL Databases 2000 VLDB 0.00040071492
209 Schism: a Workload-Driven Approach to Database Replication and Partitioning 2010 VLDB 0.00034468292
241 DB2 with BLU Acceleration: So Much More than Just a Column Store 2013 VLDB 0.00031420034
285 Automating Physical Database Design in a Parallel Database 2002 SIGMOD 0.0002899128
286 Integrating Vertical and Horizontal Partitioning into Automated Physical Database Design 2004 SIGMOD 0.00028990057
310 The Vertica Analytic Database: C-Store 7 Years Later 2012 VLDB 0.00028132402
368 Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing 1998 VLDB 0.000254931
408 Database Cracking 2007 CIDR 0.00023953844
542 Shark: SQL and Rich Analytics at Scale 2013 SIGMOD 0.00020595648
681 Materialized View Selection in a Multidimensional Database 1997 VLDB 0.00018203591
1,470 Processing a Trillion Cells per Mouse Click 2012 VLDB 0.00011833779
1,471 Adaptive Range Filters for Cold Data: Avoiding Trips to Siberia 2013 VLDB 0.00011830111
2,444 Brighthouse: An Analytic Data Warehouse for Ad-hoc Queries 2008 VLDB 8.8076551e-05
3,028 Efficient Query Processing for Multi-Dimensionally Clustered Tables in DB2 2003 VLDB 7.6816205e-05
4,061 Advanced Partitioning Techniques for Massively Distributed Computation 2012 SIGMOD 6.483587e-05
Previous Page 1 / 1 Next

Semantically Similar Papers