Petabyte-Scale Row-Level Operations in Data Lakehouses

Summary: Adds petabyte-scale row-level updates/deletes to Iceberg+Spark via file-level materialization or lazy equality/position deletes. Avoids shuffles with storage-partitioned joins, cuts write amplification with runtime filters, and uses adaptive writes; shows use-case tradeoffs and ~10× gains. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID: 13616
Venue: VLDB
Year: 2024
Pagerank: 4.6161532e-05
Overall Rank: 7,907 | 45.05%
DOI: 10.14778/3685800.3685834

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 3 of 3 citing papers.

Rank	Citing Paper	Year	Venue	Pagerank
9,236	AutoComp: Automated Data Compaction for Log-Structured Tables in Data Lakes	2025	SIGMOD	4.3648789e-05
10,743	TreeCat: Standalone Catalog Engine for Large Data Systems	2025	VLDB	4.1905499e-05
10,783	Magnus: A Holistic Approach to Data Management for Large-Scale Machine Learning Workloads	2025	VLDB	4.1905499e-05

Outgoing Citations (Sorted by Pagerank)

Showing 21 of 21 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank	Cited Paper	Year	Venue	Pagerank
20	C-Store: A Column-oriented DBMS	2005	VLDB	0.00086163998
23	A Critique of ANSI SQL Isolation Levels	1995	SIGMOD	0.00083899338
32	Differential Files: Their Application To The Maintenance Of Large Data Bases	1976	SIGMOD	0.00077553033
35	MonetDB/X100: Hyper-Pipelining Query Execution	2005	CIDR	0.00076209479
66	Spark SQL: Relational Data Processing in Spark	2015	SIGMOD	0.00061707583
167	The Snowflake Elastic Data Warehouse	2016	SIGMOD	0.00039408116
348	Serializable Isolation for Snapshot Databases	2008	SIGMOD	0.00026473778
367	Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing	1998	VLDB	0.00025518228
424	Amazon Redshift and the Case for Simpler Data Warehouses	2015	SIGMOD	0.00023604384
739	Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores	2020	VLDB	0.00017365933
789	Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing)	2010	VLDB	0.00016602215
1,356	Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics	2021	CIDR	0.00012409986
1,949	Positional Update Handling in Column Stores	2010	SIGMOD	9.9934641e-05
2,060	Dremel: A Decade of Interactive SQL Analysis at Web Scale	2020	VLDB	9.6585115e-05
2,441	CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop	2011	VLDB	8.8106295e-05
3,966	Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing	2019	SIGMOD	6.5782437e-05
4,327	Big Metadata: When Metadata is Big Data	2021	VLDB	6.2765351e-05
5,105	Analyzing and Comparing Lakehouse Storage Systems	2023	CIDR	5.6916018e-05
5,587	Magnet: Push-based Shuffle Service for Large-scale Data Processing	2020	VLDB	5.4193445e-05
6,257	VectorH: Taking SQL-on-Hadoop to the Next Level	2016	SIGMOD	5.1316025e-05
8,596	LST-Bench: Benchmarking Log-Structured Tables in the Cloud	2024	SIGMOD	4.4839589e-05

Semantically Similar Papers

Overall Rank	Paper	Year	Venue	Pagerank
11,549	Pixels: Multiversion Wide Table Store for Data Lakes	2020	CIDR	4.1905499e-05
6,397	BigLake: BigQuery’s Evolution toward a Multi-Cloud Lakehouse	2024	SIGMOD	5.0749432e-05
9,236	AutoComp: Automated Data Compaction for Log-Structured Tables in Data Lakes	2025	SIGMOD	4.3648789e-05
9,700	Towards Functional Decomposition of Storage Formats	2025	CIDR	4.2967256e-05
739	Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores	2020	VLDB	0.00017365933
3,966	Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing	2019	SIGMOD	6.5782437e-05
10,248	Active Data Lakes: Regaining Physical Data Independence Without Losing Interoperability	2026	VLDB	4.1905499e-05
1,356	Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics	2021	CIDR	0.00012409986
5,105	Analyzing and Comparing Lakehouse Storage Systems	2023	CIDR	5.6916018e-05
6,683	Adaptive and Robust Query Execution for Lakehouses at Scale	2024	VLDB	4.9593505e-05