Back to papers
Nested Parquet Is Flat, Why Not Use It? How To Scan Nested Data With On-the-Fly Key Generation and Joins
Summary: Process nested Parquet in relational engines via on-the-fly join-key generation to reconstruct nesting without materializing an internal format. Scans flat Parquet columns and uses on-the-fly joins to rebuild nesting, delivering vastly faster analytics.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 7248
- Venue
- SIGMOD
- Year
- 2025
- Pagerank
- 4.1945683e-05
- Overall Rank
- 10,494 | 27.00%
- DOI
-
10.1145/3725329
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
Outgoing Citations (Sorted by Pagerank)
Showing 23 of 23 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 98 |
XMark: A Benchmark for XML Data Management |
2002 |
VLDB |
0.00050023808 |
| 109 |
Dremel: Interactive Analysis of Web-Scale Datasets |
2010 |
VLDB |
0.00048186983 |
| 153 |
Relational Databases for Querying XML Documents: Limitations and Opportunities |
1999 |
VLDB |
0.00040784455 |
| 185 |
DuckDB: an Embeddable Analytical Database |
2019 |
SIGMOD |
0.00036538405 |
| 207 |
Storing Semistructured Data with STORED |
1999 |
SIGMOD |
0.00034611968 |
| 351 |
Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs |
2009 |
VLDB |
0.0002636504 |
| 404 |
Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited |
2014 |
VLDB |
0.00024143076 |
| 540 |
Design and Evaluation of Main Memory Hash Join Algorithms for Multi-core CPUs |
2011 |
SIGMOD |
0.0002063443 |
| 735 |
Umbra: A Disk-Based System with In-Memory Performance |
2020 |
CIDR |
0.00017452467 |
| 1,438 |
AsterixDB: A Scalable, Open Source BDMS |
2014 |
VLDB |
0.00011973592 |
| 2,062 |
Dremel: A Decade of Interactive SQL Analysis at Web Scale |
2020 |
VLDB |
9.6481955e-05 |
| 2,473 |
Photon: A Fast Query Engine for Lakehouse Systems |
2022 |
SIGMOD |
8.7237281e-05 |
| 3,644 |
BtrBlocks: Efficient Columnar Compression for Data Lakes |
2023 |
SIGMOD |
6.8854928e-05 |
| 3,721 |
To Partition, or Not to Partition, That is the Join Question in a Real System |
2021 |
SIGMOD |
6.8179379e-05 |
| 4,514 |
An Empirical Evaluation of Columnar Storage Formats |
2024 |
VLDB |
6.1204636e-05 |
| 4,704 |
JSON Tiles: Fast Analytics on Semi-Structured Data |
2021 |
SIGMOD |
5.9853687e-05 |
| 5,562 |
A Deep Dive into Common Open Formats for Analytical DBMSs |
2023 |
VLDB |
5.4331334e-05 |
| 6,078 |
The Flatter, the Better: Query Compilation Based on the Flattening Transformation |
2015 |
SIGMOD |
5.2225986e-05 |
| 6,658 |
Scalable Querying of Nested Data |
2021 |
VLDB |
4.9711629e-05 |
| 6,674 |
Exploiting Common Patterns for Tree-Structured Data |
2017 |
SIGMOD |
4.9663344e-05 |
| 7,427 |
Selection Pushdown in Column Stores using Bit Manipulation Instructions |
2023 |
SIGMOD |
4.7327406e-05 |
| 7,554 |
Storing and Querying Tree-Structured Records in Dremel |
2014 |
VLDB |
4.712434e-05 |
| 8,731 |
Columnar Formats for Schemaless LSM-based Document Stores |
2022 |
VLDB |
4.4577278e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 5,562 |
A Deep Dive into Common Open Formats for Analytical DBMSs |
2023 |
VLDB |
5.4331334e-05 |
| 3,375 |
Query Shredding: Efficient Relational Evaluation of Queries over Nested Multisets |
2014 |
SIGMOD |
7.1633324e-05 |
| 9,001 |
The Power of Nested Parallelism in Big Data Processing – Hitting Three Flies with One Slap – |
2021 |
SIGMOD |
4.4107627e-05 |
| 4,514 |
An Empirical Evaluation of Columnar Storage Formats |
2024 |
VLDB |
6.1204636e-05 |
| 2,110 |
A Recursive Algebra and Query Optimization for Nested Relations |
1989 |
SIGMOD |
9.5315487e-05 |
| 4,411 |
An Implementation for Nested Relational Databases |
1988 |
VLDB |
6.2071929e-05 |
| 8,680 |
A Practical Approach to Groupjoin and Nested Aggregates |
2021 |
VLDB |
4.4694927e-05 |
| 2,468 |
Supporting Flat Relations by a Nested Relational Kernel |
1987 |
VLDB |
8.7416405e-05 |
| 12,534 |
A Nested Relational Approach to Processing SQL Subqueries |
2005 |
SIGMOD |
4.1945683e-05 |
| 6,658 |
Scalable Querying of Nested Data |
2021 |
VLDB |
4.9711629e-05 |