AutoComp: Automated Data Compaction for Log-Structured Tables in Data Lakes
Summary: Automates compaction for log-structured tables in data lakes to curb small files and metadata bloat. AutoComp is scalable, LinkedIn-informed, integrates with OpenHouse, enabling multi-objective data-layout optimizations. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Anja Gruenheid
- 2. Jesús Camacho-Rodríguez
- 3. Carlo Curino
- 4. Raghu Ramakrishnan
- 5. Stanislav Pak
- 6. Sumedh Sakdeo
- 7. Lenisha Gandhi
- 8. Sandeep K. Singhal
- 9. Pooja Nilangekar
- 10. Daniel J. Abadi
Incoming Citations (Sorted by Pagerank)
Showing 1 of 1 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 10,196 | PTO: A Workload-driven Predictive Table Optimizer for Lakehouse Systems | 2026 | SIGMOD | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 19 of 19 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 10,196 | PTO: A Workload-driven Predictive Table Optimizer for Lakehouse Systems | 2026 | SIGMOD | 4.1945683e-05 |
| 3,644 | BtrBlocks: Efficient Columnar Compression for Data Lakes | 2023 | SIGMOD | 6.8854928e-05 |
| 10,248 | Active Data Lakes: Regaining Physical Data Independence Without Losing Interoperability | 2026 | VLDB | 4.1945683e-05 |
| 9,701 | Towards Functional Decomposition of Storage Formats | 2025 | CIDR | 4.3008468e-05 |
| 9,689 | LST-Bench: Benchmarking Log-Structured Tables in the Cloud | 2024 | SIGMOD | 4.3043822e-05 |
| 7,059 | Adaptive and Robust Query Execution for Lakehouses at Scale | 2024 | VLDB | 4.8477825e-05 |
| 1,960 | Compaction management in distributed key-value datastores | 2015 | VLDB | 9.9521444e-05 |
| 746 | Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores | 2020 | VLDB | 0.00017326979 |
| 7,907 | Petabyte-Scale Row-Level Operations in Data Lakehouses | 2024 | VLDB | 4.6205839e-05 |
| 5,318 | Analyzing and Comparing Lakehouse Storage Systems | 2023 | CIDR | 5.5715872e-05 |