Database Paper Browser

Back to papers

Petabyte-Scale Row-Level Operations in Data Lakehouses

Summary: Adds petabyte-scale row-level updates/deletes to Iceberg+Spark via file-level materialization or lazy equality/position deletes. Avoids shuffles with storage-partitioned joins, cuts write amplification with runtime filters, and uses adaptive writes; shows use-case tradeoffs and ~10× gains. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID
13615
Venue
VLDB
Year
2024
Pagerank
4.6205839e-05
Overall Rank
7,907 | 45.00%
DOI
10.14778/3685800.3685834

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 3 of 3 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 21 of 21 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
21 C-Store: A Column-oriented DBMS 2005 VLDB 0.00086087497
23 A Critique of ANSI SQL Isolation Levels 1995 SIGMOD 0.00083894938
32 Differential Files: Their Application To The Maintenance Of Large Data Bases 1976 SIGMOD 0.00077486306
35 MonetDB/X100: Hyper-Pipelining Query Execution 2005 CIDR 0.00076197749
66 Spark SQL: Relational Data Processing in Spark 2015 SIGMOD 0.00061639801
167 The Snowflake Elastic Data Warehouse 2016 SIGMOD 0.00039180521
349 Serializable Isolation for Snapshot Databases 2008 SIGMOD 0.00026440605
368 Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing 1998 VLDB 0.000254931
426 Amazon Redshift and the Case for Simpler Data Warehouses 2015 SIGMOD 0.00023594359
746 Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores 2020 VLDB 0.00017326979
794 Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing) 2010 VLDB 0.00016605103
1,377 Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics 2021 CIDR 0.00012296941
1,949 Positional Update Handling in Column Stores 2010 SIGMOD 9.9864085e-05
2,062 Dremel: A Decade of Interactive SQL Analysis at Web Scale 2020 VLDB 9.6481955e-05
2,439 CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop 2011 VLDB 8.8190594e-05
3,973 Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing 2019 SIGMOD 6.5758017e-05
4,530 Big Metadata: When Metadata is Big Data 2021 VLDB 6.1075429e-05
5,318 Analyzing and Comparing Lakehouse Storage Systems 2023 CIDR 5.5715872e-05
5,888 Magnet: Push-based Shuffle Service for Large-scale Data Processing 2020 VLDB 5.2873617e-05
6,264 VectorH: Taking SQL-on-Hadoop to the Next Level 2016 SIGMOD 5.1348427e-05
9,689 LST-Bench: Benchmarking Log-Structured Tables in the Cloud 2024 SIGMOD 4.3043822e-05
Previous Page 1 / 1 Next

Semantically Similar Papers