Selection Pushdown in Column Stores using Bit Manipulation Instructions

Summary: Generic predicate pushdown over encoded columnar data enabling direct selection without decoding via Bit Manipulation Instructions (BMI). Evaluations on Parquet/TPC-H and Spark show up to 10x scan speedups and 5.5x end-to-end with complex joins. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID: 6682
Venue: SIGMOD
Year: 2023
Pagerank: 4.7282014e-05
Overall Rank: 7,427 | 48.39%
DOI: 10.1145/3589323

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 7 of 7 citing papers.

Rank	Citing Paper	Year	Venue	Pagerank
4,516	An Empirical Evaluation of Columnar Storage Formats	2024	VLDB	6.1146215e-05
9,646	The FastLanes File Format	2025	VLDB	4.3067693e-05
9,846	HyperBlocker: Accelerating Rule-based Blocking in Entity Resolution using GPUs	2025	VLDB	4.2680295e-05
10,504	Nested Parquet Is Flat, Why Not Use It? How To Scan Nested Data With On-the-Fly Key Generation and Joins	2025	SIGMOD	4.1905499e-05
10,755	Scaling GPU-Accelerated Databases beyond GPU Memory Size	2025	VLDB	4.1905499e-05
10,808	GraphAr: An Efficient Storage Scheme for Graph Data in Data Lakes	2025	VLDB	4.1905499e-05
10,858	LiquidCache: Efficient Pushdown Caching for Cloud-Native Data Analytics	2025	VLDB	4.1905499e-05

Outgoing Citations (Sorted by Pagerank)

Showing 21 of 21 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank	Cited Paper	Year	Venue	Pagerank
20	C-Store: A Column-oriented DBMS	2005	VLDB	0.00086163998
35	MonetDB/X100: Hyper-Pipelining Query Execution	2005	CIDR	0.00076209479
66	Spark SQL: Relational Data Processing in Spark	2015	SIGMOD	0.00061707583
80	Weaving Relations for Cache Performance	2001	VLDB	0.00055735291
109	Dremel: Interactive Analysis of Web-Scale Datasets	2010	VLDB	0.00048217028
132	Integrating Compression and Execution in Column-Oriented Database Systems	2006	SIGMOD	0.00043697853
141	Selectivity Estimation Without the Attribute Value Independence Assumption	1997	VLDB	0.00041819767
167	The Snowflake Elastic Data Warehouse	2016	SIGMOD	0.00039408116
307	SIMD-Scan: Ultra Fast in-Memory Table Scan using on-Chip Vector Processing Units	2009	VLDB	0.00028226342
739	Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores	2020	VLDB	0.00017365933
959	Rethinking SIMD Vectorization for In-Memory Databases	2015	SIGMOD	0.00015034808
1,267	BitWeaving: Fast Scans for Main Memory Data Processing	2013	SIGMOD	0.00012917585
1,356	Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics	2021	CIDR	0.00012409986
1,622	Row-wise Parallel Predicate Evaluation	2008	VLDB	0.00011104582
2,390	ByteSlice: Pushing the Envelop of Main Memory Data Processing with a New Storage Layout	2015	SIGMOD	8.9006978e-05
2,472	Photon: A Fast Query Engine for Lakehouse Systems	2022	SIGMOD	8.7156826e-05
2,706	Filter Before You Parse: Faster Analytics on Raw Data with Sparser	2018	VLDB	8.2655235e-05
2,825	Mison: A Fast JSON Parser for Data Analytics	2017	VLDB	8.0575959e-05
3,923	Pushing Data-Induced Predicates Through Joins in Big-Data Clusters	2020	VLDB	6.6232068e-05
4,672	FlexPushdownDB: Hybrid Pushdown and Caching in a Cloud DBMS	2021	VLDB	6.001444e-05
6,152	Crystal: A Unified Cache Storage System for Analytical Databases	2021	VLDB	5.1802666e-05

Semantically Similar Papers

Overall Rank	Paper	Year	Venue	Pagerank
5,541	A Padded Encoding Scheme to Accelerate Scans by Leveraging Skew	2015	SIGMOD	5.4501856e-05
7,095	Fast Multi-Column Sorting in Main-Memory Column-Stores	2016	SIGMOD	4.8289712e-05
9,905	Rethinking the Encoding of Integers for Scans on Skewed Data	2023	SIGMOD	4.2537799e-05
6,366	Good to the Last Bit: Data-Driven Encoding with CodecDB	2021	SIGMOD	5.0892171e-05
9,671	BIPie: Fast Selection and Aggregation on Encoded Data using Operator Specialization	2018	SIGMOD	4.302191e-05
1,622	Row-wise Parallel Predicate Evaluation	2008	VLDB	0.00011104582
6,372	Optimization of Conjunctive Predicates for Main Memory Column Stores	2016	VLDB	5.0878306e-05
4,516	An Empirical Evaluation of Columnar Storage Formats	2024	VLDB	6.1146215e-05
9,625	Optimization of Disjunctive Predicates for Main Memory Column Stores	2017	SIGMOD	4.3115918e-05
3,611	Column Sketches: A Scan Accelerator for Rapid and Robust Predicate Evaluation	2018	SIGMOD	6.9178844e-05