Active Data Lakes: Regaining Physical Data Independence Without Losing Interoperability

Summary: Proposes Active Data Lakes to restore physical data independence in data lakes without sacrificing cross-engine interoperability. Key idea: decouple engines from Parquet-centric storage via an architecture that supports novel file formats, access paths, and media, validated by three prototype optimizations. (summarized by gpt-5.4-mini on Apr 12 2026)

Paper ID: 14285
Venue: VLDB
Year: 2026
Pagerank: 4.1905499e-05
Overall Rank: 10,248 | 28.78%
DOI: 10.14778/3797919.3797941

Incoming Non-self Citations Over Time

No non-self incoming citations found for this paper in this database.

Authors

1. Pascal Ginter
2. Viktor Leis

Incoming Citations (Sorted by Pagerank)

Showing 0 of 0 citing papers.

Rank	Citing Paper	Year	Venue	Pagerank

Outgoing Citations (Sorted by Pagerank)

Showing 22 of 22 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank	Cited Paper	Year	Venue	Pagerank
80	Weaving Relations for Cache Performance	2001	VLDB	0.00055735291
167	The Snowflake Elastic Data Warehouse	2016	SIGMOD	0.00039408116
542	Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources	2018	SIGMOD	0.00020522627
739	Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores	2020	VLDB	0.00017365933
1,356	Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics	2021	CIDR	0.00012409986
2,247	Orca: A Modular Query Optimizer Architecture for Big Data	2014	SIGMOD	9.201975e-05
2,533	Velox: Meta’s Unified Execution Engine	2022	VLDB	8.5870599e-05
3,131	Why TPC Is Not Enough: An Analysis of the Amazon Redshift Fleet	2024	VLDB	7.5054309e-05
3,642	BtrBlocks: Efficient Columnar Compression for Data Lakes	2023	SIGMOD	6.8876984e-05
4,241	The Composable Data Management System Manifesto	2023	VLDB	6.3258298e-05
4,520	The FastLanes Compression Layout: Decoding >100 Billion Integers per Second with Scalar Code	2023	VLDB	6.1119645e-05
4,873	Exploiting Cloud Object Storage for High-Performance Analytics	2023	VLDB	5.8557568e-05
6,332	Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine	2024	SIGMOD	5.1021765e-05
6,397	BigLake: BigQuery’s Evolution toward a Multi-Cloud Lakehouse	2024	SIGMOD	5.0749432e-05
6,525	Database Technology for the Masses: Sub-Operators as First-Class Entities	2021	VLDB	5.0223854e-05
6,863	Declarative Sub-Operators for Universal Data Processing	2023	VLDB	4.9003859e-05
8,607	Unity Catalog: Open and Universal Governance for the Lakehouse and Beyond	2025	SIGMOD	4.4810982e-05
9,090	Databricks Lakeguard: Supporting Fine-grained Access Control and Multi-user Capabilities for Apache Spark Workloads	2025	SIGMOD	4.3939337e-05
9,203	F3: The Open-Source Data File Format for the Future	2026	SIGMOD	4.3701616e-05
9,646	The FastLanes File Format	2025	VLDB	4.3067693e-05
9,900	AnyBlox: A Framework for Self-Decoding Datasets	2025	VLDB	4.2539423e-05
9,974	Cloudspecs: Cloud Hardware Evolution Through the Looking Glass	2026	CIDR	4.1905499e-05

Semantically Similar Papers

Overall Rank	Paper	Year	Venue	Pagerank
5,105	Analyzing and Comparing Lakehouse Storage Systems	2023	CIDR	5.6916018e-05
6,397	BigLake: BigQuery’s Evolution toward a Multi-Cloud Lakehouse	2024	SIGMOD	5.0749432e-05
10,773	The HANA Native Query Engine for Lakehouse Systems	2025	VLDB	4.1905499e-05
13,290	The Challenge of Building Effective Data Lakes	2020	SIGMOD	-
9,236	AutoComp: Automated Data Compaction for Log-Structured Tables in Data Lakes	2025	SIGMOD	4.3648789e-05
9,700	Towards Functional Decomposition of Storage Formats	2025	CIDR	4.2967256e-05
5,571	A Deep Dive into Common Open Formats for Analytical DBMSs	2023	VLDB	5.4279553e-05
1,356	Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics	2021	CIDR	0.00012409986
6,683	Adaptive and Robust Query Execution for Lakehouses at Scale	2024	VLDB	4.9593505e-05
7,907	Petabyte-Scale Row-Level Operations in Data Lakehouses	2024	VLDB	4.6161532e-05