Database Paper Browser

Back to papers

Active Data Lakes: Regaining Physical Data Independence Without Losing Interoperability

Summary: Proposes Active Data Lakes to restore physical data independence in data lakes without sacrificing cross-engine interoperability. Key idea: decouple engines from Parquet-centric storage via an architecture that supports novel file formats, access paths, and media, validated by three prototype optimizations. (summarized by gpt-5.4-mini on Apr 12 2026)

Paper ID
14284
Venue
VLDB
Year
2026
Pagerank
4.1945683e-05
Overall Rank
10,248 | 28.71%
DOI
10.14778/3797919.3797941

Incoming Non-self Citations Over Time

No non-self incoming citations found for this paper in this database.

Authors

Incoming Citations (Sorted by Pagerank)

Showing 0 of 0 citing papers.

Rank Citing Paper Year Venue Pagerank
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 22 of 22 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
80 Weaving Relations for Cache Performance 2001 VLDB 0.00055721729
167 The Snowflake Elastic Data Warehouse 2016 SIGMOD 0.00039180521
544 Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources 2018 SIGMOD 0.00020521965
746 Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores 2020 VLDB 0.00017326979
1,377 Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics 2021 CIDR 0.00012296941
2,249 Orca: A Modular Query Optimizer Architecture for Big Data 2014 SIGMOD 9.2034693e-05
2,528 Velox: Meta’s Unified Execution Engine 2022 VLDB 8.59454e-05
3,178 Why TPC Is Not Enough: An Analysis of the Amazon Redshift Fleet 2024 VLDB 7.4325992e-05
3,644 BtrBlocks: Efficient Columnar Compression for Data Lakes 2023 SIGMOD 6.8854928e-05
4,239 The Composable Data Management System Manifesto 2023 VLDB 6.3318452e-05
4,518 The FastLanes Compression Layout: Decoding >100 Billion Integers per Second with Scalar Code 2023 VLDB 6.117844e-05
4,870 Exploiting Cloud Object Storage for High-Performance Analytics 2023 VLDB 5.8613885e-05
6,340 Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine 2024 SIGMOD 5.1051018e-05
6,402 BigLake: BigQuery’s Evolution toward a Multi-Cloud Lakehouse 2024 SIGMOD 5.079818e-05
6,525 Database Technology for the Masses: Sub-Operators as First-Class Entities 2021 VLDB 5.027205e-05
6,863 Declarative Sub-Operators for Universal Data Processing 2023 VLDB 4.905092e-05
8,608 Unity Catalog: Open and Universal Governance for the Lakehouse and Beyond 2025 SIGMOD 4.4853979e-05
9,093 Databricks Lakeguard: Supporting Fine-grained Access Control and Multi-user Capabilities for Apache Spark Workloads 2025 SIGMOD 4.398149e-05
9,201 F3: The Open-Source Data File Format for the Future 2026 SIGMOD 4.3743539e-05
9,645 The FastLanes File Format 2025 VLDB 4.3109001e-05
9,901 AnyBlox: A Framework for Self-Decoding Datasets 2025 VLDB 4.258022e-05
9,975 Cloudspecs: Cloud Hardware Evolution Through the Looking Glass 2026 CIDR 4.1945683e-05
Previous Page 1 / 1 Next

Semantically Similar Papers