Database Paper Browser

Back to papers

Nested Parquet Is Flat, Why Not Use It? How To Scan Nested Data With On-the-Fly Key Generation and Joins

Summary: Process nested Parquet in relational engines via on-the-fly join-key generation to reconstruct nesting without materializing an internal format. Scans flat Parquet columns and uses on-the-fly joins to rebuild nesting, delivering vastly faster analytics. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
7248
Venue
SIGMOD
Year
2025
Pagerank
4.1945683e-05
Overall Rank
10,494 | 27.00%
DOI
10.1145/3725329

Incoming Non-self Citations Over Time

No non-self incoming citations found for this paper in this database.

Authors

Incoming Citations (Sorted by Pagerank)

Showing 0 of 0 citing papers.

Rank Citing Paper Year Venue Pagerank
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 23 of 23 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
98 XMark: A Benchmark for XML Data Management 2002 VLDB 0.00050023808
109 Dremel: Interactive Analysis of Web-Scale Datasets 2010 VLDB 0.00048186983
153 Relational Databases for Querying XML Documents: Limitations and Opportunities 1999 VLDB 0.00040784455
185 DuckDB: an Embeddable Analytical Database 2019 SIGMOD 0.00036538405
207 Storing Semistructured Data with STORED 1999 SIGMOD 0.00034611968
351 Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs 2009 VLDB 0.0002636504
404 Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited 2014 VLDB 0.00024143076
540 Design and Evaluation of Main Memory Hash Join Algorithms for Multi-core CPUs 2011 SIGMOD 0.0002063443
735 Umbra: A Disk-Based System with In-Memory Performance 2020 CIDR 0.00017452467
1,438 AsterixDB: A Scalable, Open Source BDMS 2014 VLDB 0.00011973592
2,062 Dremel: A Decade of Interactive SQL Analysis at Web Scale 2020 VLDB 9.6481955e-05
2,473 Photon: A Fast Query Engine for Lakehouse Systems 2022 SIGMOD 8.7237281e-05
3,644 BtrBlocks: Efficient Columnar Compression for Data Lakes 2023 SIGMOD 6.8854928e-05
3,721 To Partition, or Not to Partition, That is the Join Question in a Real System 2021 SIGMOD 6.8179379e-05
4,514 An Empirical Evaluation of Columnar Storage Formats 2024 VLDB 6.1204636e-05
4,704 JSON Tiles: Fast Analytics on Semi-Structured Data 2021 SIGMOD 5.9853687e-05
5,562 A Deep Dive into Common Open Formats for Analytical DBMSs 2023 VLDB 5.4331334e-05
6,078 The Flatter, the Better: Query Compilation Based on the Flattening Transformation 2015 SIGMOD 5.2225986e-05
6,658 Scalable Querying of Nested Data 2021 VLDB 4.9711629e-05
6,674 Exploiting Common Patterns for Tree-Structured Data 2017 SIGMOD 4.9663344e-05
7,427 Selection Pushdown in Column Stores using Bit Manipulation Instructions 2023 SIGMOD 4.7327406e-05
7,554 Storing and Querying Tree-Structured Records in Dremel 2014 VLDB 4.712434e-05
8,731 Columnar Formats for Schemaless LSM-based Document Stores 2022 VLDB 4.4577278e-05
Previous Page 1 / 1 Next

Semantically Similar Papers