Back to papers
Petabyte-Scale Row-Level Operations in Data Lakehouses
Summary: Adds petabyte-scale row-level updates/deletes to Iceberg+Spark via file-level materialization or lazy equality/position deletes. Avoids shuffles with storage-partitioned joins, cuts write amplification with runtime filters, and uses adaptive writes; shows use-case tradeoffs and ~10× gains.
(summarized by gpt-5-mini on Feb 09 2026)
- Paper ID
- 13615
- Venue
- VLDB
- Year
- 2024
- Pagerank
- 4.6205839e-05
- Overall Rank
- 7,907 | 45.00%
- DOI
-
10.14778/3685800.3685834
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 3 of 3 citing papers.
Outgoing Citations (Sorted by Pagerank)
Showing 21 of 21 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 21 |
C-Store: A Column-oriented DBMS |
2005 |
VLDB |
0.00086087497 |
| 23 |
A Critique of ANSI SQL Isolation Levels |
1995 |
SIGMOD |
0.00083894938 |
| 32 |
Differential Files: Their Application To The Maintenance Of Large Data Bases |
1976 |
SIGMOD |
0.00077486306 |
| 35 |
MonetDB/X100: Hyper-Pipelining Query Execution |
2005 |
CIDR |
0.00076197749 |
| 66 |
Spark SQL: Relational Data Processing in Spark |
2015 |
SIGMOD |
0.00061639801 |
| 167 |
The Snowflake Elastic Data Warehouse |
2016 |
SIGMOD |
0.00039180521 |
| 349 |
Serializable Isolation for Snapshot Databases |
2008 |
SIGMOD |
0.00026440605 |
| 368 |
Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing |
1998 |
VLDB |
0.000254931 |
| 426 |
Amazon Redshift and the Case for Simpler Data Warehouses |
2015 |
SIGMOD |
0.00023594359 |
| 746 |
Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores |
2020 |
VLDB |
0.00017326979 |
| 794 |
Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing) |
2010 |
VLDB |
0.00016605103 |
| 1,377 |
Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics |
2021 |
CIDR |
0.00012296941 |
| 1,949 |
Positional Update Handling in Column Stores |
2010 |
SIGMOD |
9.9864085e-05 |
| 2,062 |
Dremel: A Decade of Interactive SQL Analysis at Web Scale |
2020 |
VLDB |
9.6481955e-05 |
| 2,439 |
CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop |
2011 |
VLDB |
8.8190594e-05 |
| 3,973 |
Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing |
2019 |
SIGMOD |
6.5758017e-05 |
| 4,530 |
Big Metadata: When Metadata is Big Data |
2021 |
VLDB |
6.1075429e-05 |
| 5,318 |
Analyzing and Comparing Lakehouse Storage Systems |
2023 |
CIDR |
5.5715872e-05 |
| 5,888 |
Magnet: Push-based Shuffle Service for Large-scale Data Processing |
2020 |
VLDB |
5.2873617e-05 |
| 6,264 |
VectorH: Taking SQL-on-Hadoop to the Next Level |
2016 |
SIGMOD |
5.1348427e-05 |
| 9,689 |
LST-Bench: Benchmarking Log-Structured Tables in the Cloud |
2024 |
SIGMOD |
4.3043822e-05 |
Semantically Similar Papers