Back to papers
Parallel Data Analysis Directly on Scientific File Formats
Summary: Query processing runs directly on HDF5 scientific arrays, eliminating data loading in analysis workflows. It uses bitmap indexing and in-memory parallel execution to exploit supercomputers, outperforming Hive by over 10x and relational DBs.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 4917
- Venue
- SIGMOD
- Year
- 2014
- Pagerank
- 8.1679384e-05
- Overall Rank
- 2,757 | 80.83%
- DOI
-
10.1145/2588555.2612185
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 17 of 17 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 3,343 |
Comparative Evaluation of Big-Data Systems on Scientific Image Analytics Workloads |
2017 |
VLDB |
7.1967343e-05 |
| 3,437 |
Speculative Distributed CSV Data Parsing for Big Data Analytics |
2019 |
SIGMOD |
7.0942161e-05 |
| 3,891 |
Slalom: Coasting Through Raw Data via Adaptive Partitioning and Indexing |
2017 |
VLDB |
6.659442e-05 |
| 4,326 |
Fast Queries Over Heterogeneous Data Through Engine Customization |
2016 |
VLDB |
6.288323e-05 |
| 4,704 |
JSON Tiles: Fast Analytics on Semi-Structured Data |
2021 |
SIGMOD |
5.9853687e-05 |
| 4,839 |
ChronosDB: Distributed, File Based, Geospatial Array DBMS |
2018 |
VLDB |
5.8875955e-05 |
| 5,301 |
ReCache: Reactive Caching for Fast Analytics over Heterogeneous Data |
2018 |
VLDB |
5.5790928e-05 |
| 6,407 |
Just-In-Time Data Virtualization: Lightweight Data Management with ViDa |
2015 |
CIDR |
5.076547e-05 |
| 7,360 |
ParPaRaw: Massively Parallel Parsing of Delimiter-Separated Raw Data |
2020 |
VLDB |
4.7525925e-05 |
| 7,830 |
Scalable Structural Index Construction for JSON Analytics |
2021 |
VLDB |
4.6388763e-05 |
| 7,917 |
Array DBMS: Past, Present, and (Near) Future |
2021 |
VLDB |
4.6173899e-05 |
| 8,788 |
FishStore: Faster Ingestion with Subset Hashing |
2019 |
SIGMOD |
4.451039e-05 |
| 9,379 |
GIO: Generating Efficient Matrix and Frame Readers for Custom Data Formats by Example |
2023 |
SIGMOD |
4.3462787e-05 |
| 9,918 |
Shared Load(ing): Efficient Bulk Loading into Optimized Storage |
2020 |
CIDR |
4.2561557e-05 |
| 11,436 |
Algorithms for a Topology-aware Massively Parallel Computation Model |
2021 |
PODS |
4.1945683e-05 |
| 11,614 |
BitFun: Fast Answers to Queries with Tunable Functions in Geospatial Array DBMS |
2020 |
VLDB |
4.1945683e-05 |
| 11,850 |
Vectorizing an In Situ Query Engine |
2016 |
SIGMOD |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 15 of 15 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 21 |
C-Store: A Column-oriented DBMS |
2005 |
VLDB |
0.00086087497 |
| 35 |
MonetDB/X100: Hyper-Pipelining Query Execution |
2005 |
CIDR |
0.00076197749 |
| 113 |
Encapsulation of Parallelism in the Volcano Query Processing System |
1990 |
SIGMOD |
0.00046764513 |
| 318 |
Overview of SciDB: Large Scale Array Storage, Processing and Analysis |
2010 |
SIGMOD |
0.00027795661 |
| 404 |
Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited |
2014 |
VLDB |
0.00024143076 |
| 860 |
The Multidimensional Database System RasDaMan |
1998 |
SIGMOD |
0.00015860465 |
| 960 |
A Comparison of Join Algorithms for Log Processing in MapReduce |
2010 |
SIGMOD |
0.00015012242 |
| 1,035 |
Bitmap Index Design and Evaluation |
1998 |
SIGMOD |
0.00014532778 |
| 1,343 |
NoDB: Efficient Query Execution on Raw Data Files |
2012 |
SIGMOD |
0.00012482538 |
| 1,412 |
A Query Language for Multidimensional Arrays: Design, Implementation, and Optimization Techniques |
1996 |
SIGMOD |
0.00012122159 |
| 1,429 |
A Scalable, Predictable Join Operator for Highly Concurrent Data Warehouses |
2009 |
VLDB |
0.00012033518 |
| 1,704 |
An Efficient Bitmap Encoding Scheme for Selection Queries |
1999 |
SIGMOD |
0.000108332 |
| 1,876 |
ArrayStore: A Storage Manager for Complex Parallel Array Processing |
2011 |
SIGMOD |
0.00010239284 |
| 4,820 |
SciQL: Array Data Processing Inside an RDBMS |
2013 |
SIGMOD |
5.8972557e-05 |
| 5,294 |
GLADE: Big Data Analytics Made Easy |
2012 |
SIGMOD |
5.5810654e-05 |
Semantically Similar Papers