Database Paper Browser

Back to papers

AutoComp: Automated Data Compaction for Log-Structured Tables in Data Lakes

Summary: Automates compaction for log-structured tables in data lakes to curb small files and metadata bloat. AutoComp is scalable, LinkedIn-informed, integrates with OpenHouse, enabling multi-objective data-layout optimizations. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
7093
Venue
SIGMOD
Year
2025
Pagerank
4.3690661e-05
Overall Rank
9,232 | 35.78%
DOI
10.1145/3722212.3724430

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 1 of 1 citing papers.

Rank Citing Paper Year Venue Pagerank
10,196 PTO: A Workload-driven Predictive Table Optimizer for Lakehouse Systems 2026 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 19 of 19 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
32 Differential Files: Their Application To The Maintenance Of Large Data Bases 1976 SIGMOD 0.00077486306
87 Hekaton: SQL Server’s Memory-Optimized OLTP Engine 2013 SIGMOD 0.00052389723
379 bLSM: A General Purpose Log Structured Merge Tree 2012 SIGMOD 0.0002493527
659 The Making of TPC-DS 2006 VLDB 0.00018500853
746 Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores 2020 VLDB 0.00017326979
858 Efficient Transaction Processing in SAP HANA Database – The End of a Column Store Myth 2012 SIGMOD 0.000158756
1,700 Bridging the Archipelago between Row-Stores and Column-Stores for Hybrid Workloads 2016 SIGMOD 0.00010858865
1,960 Compaction management in distributed key-value datastores 2015 VLDB 9.9521444e-05
3,038 Azure Data Lake Store: A Hyperscale Distributed File Service for Big Data Analytics 2017 SIGMOD 7.6717218e-05
3,793 Constructing and Analyzing the LSM Compaction Design Space 2021 VLDB 6.7617833e-05
3,973 Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing 2019 SIGMOD 6.5758017e-05
4,425 Nova: Continuous Pig/Hadoop Workflows 2011 SIGMOD 6.198382e-05
4,717 Cloud Analytics Benchmark 2023 VLDB 5.9751539e-05
7,907 Petabyte-Scale Row-Level Operations in Data Lakehouses 2024 VLDB 4.6205839e-05
8,519 Extending Polaris to Support Transactions 2024 SIGMOD 4.494088e-05
8,642 Automatic Workload Driven Index Defragmentation 2011 VLDB 4.4785896e-05
9,190 MLOS in Action: Bridging the Gap Between Experimentation and Auto-Tuning in the Cloud 2024 VLDB 4.3768215e-05
9,286 Fragmentation in Large Object Repositories 2007 CIDR 4.3623271e-05
9,689 LST-Bench: Benchmarking Log-Structured Tables in the Cloud 2024 SIGMOD 4.3043822e-05
Previous Page 1 / 1 Next

Semantically Similar Papers