Database Paper Browser

Back to papers

Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics

Summary: Proposes the Lakehouse: an open-format (Parquet) platform unifying data warehousing and first-class ML/data‑science to address staleness, lock‑in, reliability and TCO. Presents a Parquet-based prototype that matches cloud data warehouses on TPC-DS and discusses research implications. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID
416
Venue
CIDR
Year
2021
Pagerank
0.00012296941
Overall Rank
1,377 | 90.43%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 44 of 44 citing papers.

Rank Citing Paper Year Venue Pagerank
2,473 Photon: A Fast Query Engine for Lakehouse Systems 2022 SIGMOD 8.7237281e-05
3,644 BtrBlocks: Efficient Columnar Compression for Data Lakes 2023 SIGMOD 6.8854928e-05
4,514 An Empirical Evaluation of Columnar Storage Formats 2024 VLDB 6.1204636e-05
4,557 Distributed Deep Learning on Data Systems: A Comparative Analysis of Approaches 2021 VLDB 6.087611e-05
4,863 Data-Sharing Markets: Model, Protocol, and Algorithms to Incentivize the Formation of Data-Sharing Consortia 2023 SIGMOD 5.8697471e-05
5,318 Analyzing and Comparing Lakehouse Storage Systems 2023 CIDR 5.5715872e-05
5,562 A Deep Dive into Common Open Formats for Analytical DBMSs 2023 VLDB 5.4331334e-05
5,640 AutoSteer: Learned Query Optimization for Any SQL Database 2023 VLDB 5.3933314e-05
6,261 The Cosmos Big Data Platform at Microsoft: Over a Decade of Progress and a Decade to Look Forward 2021 VLDB 5.1350714e-05
6,541 ConnectorX: Accelerating Data Loading From Databases to Dataframes 2022 VLDB 5.0216945e-05
6,715 Shared Foundations: Modernizing Meta's Data Lakehouse 2023 CIDR 4.9509939e-05
7,059 Adaptive and Robust Query Execution for Lakehouses at Scale 2024 VLDB 4.8477825e-05
7,427 Selection Pushdown in Column Stores using Bit Manipulation Instructions 2023 SIGMOD 4.7327406e-05
7,469 Bullion: A Column Store for Machine Learning 2025 CIDR 4.7204398e-05
7,814 Deep Lake: a Lakehouse for Deep Learning 2023 CIDR 4.6439001e-05
7,907 Petabyte-Scale Row-Level Operations in Data Lakehouses 2024 VLDB 4.6205839e-05
8,608 Unity Catalog: Open and Universal Governance for the Lakehouse and Beyond 2025 SIGMOD 4.4853979e-05
9,016 Making Data Engineering Declarative 2023 CIDR 4.4094312e-05
9,093 Databricks Lakeguard: Supporting Fine-grained Access Control and Multi-user Capabilities for Apache Spark Workloads 2025 SIGMOD 4.398149e-05
9,111 Meta's Next-generation Realtime Monitoring and Analytics Platform 2022 VLDB 4.3942367e-05
9,379 GIO: Generating Efficient Matrix and Frame Readers for Custom Data Formats by Example 2023 SIGMOD 4.3462787e-05
9,645 The FastLanes File Format 2025 VLDB 4.3109001e-05
9,689 LST-Bench: Benchmarking Log-Structured Tables in the Cloud 2024 SIGMOD 4.3043822e-05
9,701 Towards Functional Decomposition of Storage Formats 2025 CIDR 4.3008468e-05
9,760 Adaptive data transformations for QaaS 2025 CIDR 4.2856106e-05
9,808 Photon: A High-Performance Query Engine for the Lakehouse 2022 CIDR 4.2794025e-05
9,901 AnyBlox: A Framework for Self-Decoding Datasets 2025 VLDB 4.258022e-05
9,917 Check Out the Big Brain on BRAD: Simplifying Cloud Data Processing with Learned Automated Data Meshes 2023 VLDB 4.2561557e-05
9,973 End-to-End Declarative Data Analytics: Co-designing Engines, Interfaces, and Cloud Infrastructure 2026 CIDR 4.1945683e-05
10,196 PTO: A Workload-driven Predictive Table Optimizer for Lakehouse Systems 2026 SIGMOD 4.1945683e-05
10,248 Active Data Lakes: Regaining Physical Data Independence Without Losing Interoperability 2026 VLDB 4.1945683e-05
10,415 SAP HANA Cloud: Data Management for Modern Enterprise Applications 2025 SIGMOD 4.1945683e-05
10,571 Quantum Data Management in the NISQ Era 2025 VLDB 4.1945683e-05
10,607 Hermes: Off-the-Shelf Real-Time Transactional Analytics 2025 VLDB 4.1945683e-05
10,609 LogCloud: Fast Search of Compressed Logs on Object Storage 2025 VLDB 4.1945683e-05
10,711 Cracking Vector Search Indexes 2025 VLDB 4.1945683e-05
10,767 The HANA Native Query Engine for Lakehouse Systems 2025 VLDB 4.1945683e-05
10,777 Magnus: A Holistic Approach to Data Management for Large-Scale Machine Learning Workloads 2025 VLDB 4.1945683e-05
10,789 Ursa: A Lakehouse-Native Data Streaming Engine for Kafka 2025 VLDB 4.1945683e-05
10,803 GraphAr: An Efficient Storage Scheme for Graph Data in Data Lakes 2025 VLDB 4.1945683e-05
10,846 Disaggregation: A New Architecture for Cloud Databases 2025 VLDB 4.1945683e-05
10,854 LiquidCache: Efficient Pushdown Caching for Cloud-Native Data Analytics 2025 VLDB 4.1945683e-05
10,856 Analyzing Near-Network Hardware Acceleration with Co-Processing on DPUs 2025 VLDB 4.1945683e-05
11,150 Zed: Leveraging Data Types to Process Eclectic Data 2023 CIDR 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 13 of 13 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers