Back to papers
Active Data Lakes: Regaining Physical Data Independence Without Losing Interoperability
Summary: Proposes Active Data Lakes to restore physical data independence in data lakes without sacrificing cross-engine interoperability. Key idea: decouple engines from Parquet-centric storage via an architecture that supports novel file formats, access paths, and media, validated by three prototype optimizations.
(summarized by gpt-5.4-mini on Apr 12 2026)
- Paper ID
- 14284
- Venue
- VLDB
- Year
- 2026
- Pagerank
- 4.1945683e-05
- Overall Rank
- 10,248 | 28.71%
- DOI
-
10.14778/3797919.3797941
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
Outgoing Citations (Sorted by Pagerank)
Showing 22 of 22 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 80 |
Weaving Relations for Cache Performance |
2001 |
VLDB |
0.00055721729 |
| 167 |
The Snowflake Elastic Data Warehouse |
2016 |
SIGMOD |
0.00039180521 |
| 544 |
Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources |
2018 |
SIGMOD |
0.00020521965 |
| 746 |
Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores |
2020 |
VLDB |
0.00017326979 |
| 1,377 |
Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics |
2021 |
CIDR |
0.00012296941 |
| 2,249 |
Orca: A Modular Query Optimizer Architecture for Big Data |
2014 |
SIGMOD |
9.2034693e-05 |
| 2,528 |
Velox: Meta’s Unified Execution Engine |
2022 |
VLDB |
8.59454e-05 |
| 3,178 |
Why TPC Is Not Enough: An Analysis of the Amazon Redshift Fleet |
2024 |
VLDB |
7.4325992e-05 |
| 3,644 |
BtrBlocks: Efficient Columnar Compression for Data Lakes |
2023 |
SIGMOD |
6.8854928e-05 |
| 4,239 |
The Composable Data Management System Manifesto |
2023 |
VLDB |
6.3318452e-05 |
| 4,518 |
The FastLanes Compression Layout: Decoding >100 Billion Integers per Second with Scalar Code |
2023 |
VLDB |
6.117844e-05 |
| 4,870 |
Exploiting Cloud Object Storage for High-Performance Analytics |
2023 |
VLDB |
5.8613885e-05 |
| 6,340 |
Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine |
2024 |
SIGMOD |
5.1051018e-05 |
| 6,402 |
BigLake: BigQuery’s Evolution toward a Multi-Cloud Lakehouse |
2024 |
SIGMOD |
5.079818e-05 |
| 6,525 |
Database Technology for the Masses: Sub-Operators as First-Class Entities |
2021 |
VLDB |
5.027205e-05 |
| 6,863 |
Declarative Sub-Operators for Universal Data Processing |
2023 |
VLDB |
4.905092e-05 |
| 8,608 |
Unity Catalog: Open and Universal Governance for the Lakehouse and Beyond |
2025 |
SIGMOD |
4.4853979e-05 |
| 9,093 |
Databricks Lakeguard: Supporting Fine-grained Access Control and Multi-user Capabilities for Apache Spark Workloads |
2025 |
SIGMOD |
4.398149e-05 |
| 9,201 |
F3: The Open-Source Data File Format for the Future |
2026 |
SIGMOD |
4.3743539e-05 |
| 9,645 |
The FastLanes File Format |
2025 |
VLDB |
4.3109001e-05 |
| 9,901 |
AnyBlox: A Framework for Self-Decoding Datasets |
2025 |
VLDB |
4.258022e-05 |
| 9,975 |
Cloudspecs: Cloud Hardware Evolution Through the Looking Glass |
2026 |
CIDR |
4.1945683e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 5,318 |
Analyzing and Comparing Lakehouse Storage Systems |
2023 |
CIDR |
5.5715872e-05 |
| 6,402 |
BigLake: BigQuery’s Evolution toward a Multi-Cloud Lakehouse |
2024 |
SIGMOD |
5.079818e-05 |
| 10,767 |
The HANA Native Query Engine for Lakehouse Systems |
2025 |
VLDB |
4.1945683e-05 |
| 13,277 |
The Challenge of Building Effective Data Lakes |
2020 |
SIGMOD |
- |
| 9,232 |
AutoComp: Automated Data Compaction for Log-Structured Tables in Data Lakes |
2025 |
SIGMOD |
4.3690661e-05 |
| 9,701 |
Towards Functional Decomposition of Storage Formats |
2025 |
CIDR |
4.3008468e-05 |
| 5,562 |
A Deep Dive into Common Open Formats for Analytical DBMSs |
2023 |
VLDB |
5.4331334e-05 |
| 1,377 |
Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics |
2021 |
CIDR |
0.00012296941 |
| 7,059 |
Adaptive and Robust Query Execution for Lakehouses at Scale |
2024 |
VLDB |
4.8477825e-05 |
| 7,907 |
Petabyte-Scale Row-Level Operations in Data Lakehouses |
2024 |
VLDB |
4.6205839e-05 |