Database Paper Browser

Back to papers

Photon: A Fast Query Engine for Lakehouse Systems

Summary: Photon is a vectorized Lakehouse query engine, delivering fast queries on raw data lakes and Parquet via Spark API. Design choices (vectorization vs. code generation), memory manager, and SQL/Spark integration enable 10x gains and a 100TB TPC-DS record. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
6430
Venue
SIGMOD
Year
2022
Pagerank
8.7237281e-05
Overall Rank
2,473 | 82.80%
DOI
10.1145/3514221.3526054

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 28 of 28 citing papers.

Rank Citing Paper Year Venue Pagerank
3,644 BtrBlocks: Efficient Columnar Compression for Data Lakes 2023 SIGMOD 6.8854928e-05
4,239 The Composable Data Management System Manifesto 2023 VLDB 6.3318452e-05
4,495 ClickHouse - Lightning Fast Analytics for Everyone 2024 VLDB 6.1410277e-05
4,870 Exploiting Cloud Object Storage for High-Performance Analytics 2023 VLDB 5.8613885e-05
5,318 Analyzing and Comparing Lakehouse Storage Systems 2023 CIDR 5.5715872e-05
5,531 Presto: A Decade of SQL Analytics at Meta 2023 SIGMOD 5.4549499e-05
6,340 Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine 2024 SIGMOD 5.1051018e-05
7,059 Adaptive and Robust Query Execution for Lakehouses at Scale 2024 VLDB 4.8477825e-05
7,427 Selection Pushdown in Column Stores using Bit Manipulation Instructions 2023 SIGMOD 4.7327406e-05
7,546 Is Perfect Hashing Practical for OLAP Systems? 2024 CIDR 4.7148429e-05
7,814 Deep Lake: a Lakehouse for Deep Learning 2023 CIDR 4.6439001e-05
7,916 Terabyte-Scale Analytics in the Blink of an Eye 2026 VLDB 4.6173899e-05
8,173 Sigma Workbook: A Spreadsheet for Cloud Data Warehouses 2022 VLDB 4.568186e-05
8,479 Excalibur: A Virtual Machine for Adaptive Fine-grained JIT-Compiled Query Execution based on VOILA 2023 VLDB 4.5014929e-05
8,856 Composable Data Management: An Execution Overview 2024 VLDB 4.4346165e-05
9,857 Towards Unifying Query Interpretation and Compilation 2023 CIDR 4.269353e-05
9,973 End-to-End Declarative Data Analytics: Co-designing Engines, Interfaces, and Cloud Infrastructure 2026 CIDR 4.1945683e-05
10,372 Data Chunk Compaction in Vectorized Execution 2025 SIGMOD 4.1945683e-05
10,494 Nested Parquet Is Flat, Why Not Use It? How To Scan Nested Data With On-the-Fly Key Generation and Joins 2025 SIGMOD 4.1945683e-05
10,714 Towards Designing Future-Proof Data Processing Systems 2025 VLDB 4.1945683e-05
10,767 The HANA Native Query Engine for Lakehouse Systems 2025 VLDB 4.1945683e-05
10,777 Magnus: A Holistic Approach to Data Management for Large-Scale Machine Learning Workloads 2025 VLDB 4.1945683e-05
10,787 AnalyticDB-PG: A Cloud-native High-performance Data Warehouse in Alibaba Cloud 2025 VLDB 4.1945683e-05
10,803 GraphAr: An Efficient Storage Scheme for Graph Data in Data Lakes 2025 VLDB 4.1945683e-05
10,854 LiquidCache: Efficient Pushdown Caching for Cloud-Native Data Analytics 2025 VLDB 4.1945683e-05
10,969 Query Compilation Without Regrets 2024 SIGMOD 4.1945683e-05
11,090 Simple (yet Efficient) Function Authoring for Vectorized Engines 2024 VLDB 4.1945683e-05
11,269 Big Data Analytic Toolkit: A general-purpose, modular, and heterogeneous acceleration toolkit for data analytical engines 2023 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 19 of 19 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
35 MonetDB/X100: Hyper-Pipelining Query Execution 2005 CIDR 0.00076197749
52 Database Architecture Optimized for the new Bottleneck: Memory Access 1999 VLDB 0.00066474881
60 Efficiently Compiling Efficient Query Plans for Modern Hardware 2011 VLDB 0.00064439773
66 Spark SQL: Relational Data Processing in Spark 2015 SIGMOD 0.00061639801
113 Encapsulation of Parallelism in the Volcano Query Processing System 1990 SIGMOD 0.00046764513
167 The Snowflake Elastic Data Warehouse 2016 SIGMOD 0.00039180521
310 The Vertica Analytic Database: C-Store 7 Years Later 2012 VLDB 0.00028132402
343 Implementing Database Operations Using SIMD Instructions 2002 SIGMOD 0.00026768139
413 HaLoop: Efficient Iterative Data Processing on Large Clusters 2010 VLDB 0.00023904409
746 Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores 2020 VLDB 0.00017326979
853 Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask 2018 VLDB 0.00015940507
958 Rethinking SIMD Vectorization for In-Memory Databases 2015 SIGMOD 0.00015045316
1,263 Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation 2016 SIGMOD 0.00012982857
1,377 Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics 2021 CIDR 0.00012296941
1,750 Weld: A Common Runtime for High Performance Data Analytics 2017 CIDR 0.00010683647
1,864 Relaxed Operator Fusion for In-Memory Databases: Making Compilation, Vectorization, and Prefetching Work Together At Last 2018 VLDB 0.00010280966
2,896 Evaluating End-to-End Optimization for Data Analytics Applications in Weld 2018 VLDB 7.9452051e-05
3,882 Micro Adaptivity in Vectorwise 2013 SIGMOD 6.6690423e-05
4,281 Maximizing Persistent Memory Bandwidth Utilization for OLAP Workloads 2021 SIGMOD 6.2940039e-05
Previous Page 1 / 1 Next

Semantically Similar Papers