Database Paper Browser

Back to papers

Parallel Data Analysis Directly on Scientific File Formats

Summary: Query processing runs directly on HDF5 scientific arrays, eliminating data loading in analysis workflows. It uses bitmap indexing and in-memory parallel execution to exploit supercomputers, outperforming Hive by over 10x and relational DBs. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
4917
Venue
SIGMOD
Year
2014
Pagerank
8.1679384e-05
Overall Rank
2,757 | 80.83%
DOI
10.1145/2588555.2612185

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 17 of 17 citing papers.

Rank Citing Paper Year Venue Pagerank
3,343 Comparative Evaluation of Big-Data Systems on Scientific Image Analytics Workloads 2017 VLDB 7.1967343e-05
3,437 Speculative Distributed CSV Data Parsing for Big Data Analytics 2019 SIGMOD 7.0942161e-05
3,891 Slalom: Coasting Through Raw Data via Adaptive Partitioning and Indexing 2017 VLDB 6.659442e-05
4,326 Fast Queries Over Heterogeneous Data Through Engine Customization 2016 VLDB 6.288323e-05
4,704 JSON Tiles: Fast Analytics on Semi-Structured Data 2021 SIGMOD 5.9853687e-05
4,839 ChronosDB: Distributed, File Based, Geospatial Array DBMS 2018 VLDB 5.8875955e-05
5,301 ReCache: Reactive Caching for Fast Analytics over Heterogeneous Data 2018 VLDB 5.5790928e-05
6,407 Just-In-Time Data Virtualization: Lightweight Data Management with ViDa 2015 CIDR 5.076547e-05
7,360 ParPaRaw: Massively Parallel Parsing of Delimiter-Separated Raw Data 2020 VLDB 4.7525925e-05
7,830 Scalable Structural Index Construction for JSON Analytics 2021 VLDB 4.6388763e-05
7,917 Array DBMS: Past, Present, and (Near) Future 2021 VLDB 4.6173899e-05
8,788 FishStore: Faster Ingestion with Subset Hashing 2019 SIGMOD 4.451039e-05
9,379 GIO: Generating Efficient Matrix and Frame Readers for Custom Data Formats by Example 2023 SIGMOD 4.3462787e-05
9,918 Shared Load(ing): Efficient Bulk Loading into Optimized Storage 2020 CIDR 4.2561557e-05
11,436 Algorithms for a Topology-aware Massively Parallel Computation Model 2021 PODS 4.1945683e-05
11,614 BitFun: Fast Answers to Queries with Tunable Functions in Geospatial Array DBMS 2020 VLDB 4.1945683e-05
11,850 Vectorizing an In Situ Query Engine 2016 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 15 of 15 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers