Back to papers
ArrayMorph: Optimizing Hyperslab Queries on the Cloud for Machine Learning Pipelines
Summary: Cost-based multi-phase optimizer for cloud hyperslab (subtensor) reads that models array serialization, chunking, and platform costs to auto-select chunk reads, byte-ranges, or server-side lambdas. Integrates with PyTorch; reduces transferred data up to 9.8×, speeds pipelines up to 1.7× and lowers monetary cost up to 9×.
(summarized by gpt-5-mini on Feb 09 2026)
- Paper ID
- 13952
- Venue
- VLDB
- Year
- 2025
- Pagerank
- 4.1945683e-05
- Overall Rank
- 10,662 | 25.83%
- DOI
-
10.14778/3746405.3746437
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
Outgoing Citations (Sorted by Pagerank)
Showing 17 of 17 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 66 |
Spark SQL: Relational Data Processing in Spark |
2015 |
SIGMOD |
0.00061639801 |
| 70 |
Hive - A Warehousing Solution Over a Map-Reduce Framework |
2009 |
VLDB |
0.00059533166 |
| 167 |
The Snowflake Elastic Data Warehouse |
2016 |
SIGMOD |
0.00039180521 |
| 316 |
NoScope: Optimizing Neural Network Queries over Video at Scale |
2017 |
VLDB |
0.00027988668 |
| 318 |
Overview of SciDB: Large Scale Array Storage, Processing and Analysis |
2010 |
SIGMOD |
0.00027795661 |
| 426 |
Amazon Redshift and the Case for Simpler Data Warehouses |
2015 |
SIGMOD |
0.00023594359 |
| 734 |
The TileDB Array Data Storage Manager |
2017 |
VLDB |
0.00017455248 |
| 1,326 |
Starling: A Scalable Query Engine on Cloud Functions |
2020 |
SIGMOD |
0.00012576952 |
| 1,876 |
ArrayStore: A Storage Manager for Complex Parallel Array Processing |
2011 |
SIGMOD |
0.00010239284 |
| 2,424 |
Lambada: Interactive Data Analytics on Cold Data Using Serverless Cloud Infrastructure |
2020 |
SIGMOD |
8.8380822e-05 |
| 2,791 |
Towards Demystifying Serverless Machine Learning Training |
2021 |
SIGMOD |
8.1206618e-05 |
| 4,667 |
FlexPushdownDB: Hybrid Pushdown and Caching in a Cloud DBMS |
2021 |
VLDB |
6.0116919e-05 |
| 4,839 |
ChronosDB: Distributed, File Based, Geospatial Array DBMS |
2018 |
VLDB |
5.8875955e-05 |
| 5,960 |
Skew-Aware Join Optimization for Array Databases |
2015 |
SIGMOD |
5.2559595e-05 |
| 6,507 |
Similarity Join over Array Data |
2016 |
SIGMOD |
5.0337166e-05 |
| 7,092 |
CompuCache: Remote Computable Caching using Spot VMs |
2022 |
CIDR |
4.8370308e-05 |
| 9,601 |
SkyPIE: A Fast & Accurate Oracle for Object Placement |
2024 |
SIGMOD |
4.3177432e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 13,425 |
Data Mining Algorithms as a Service in the Cloud: Exploiting Relational Database Systems |
2013 |
SIGMOD |
- |
| 4,961 |
Releasing Cloud Databases from the Chains of Performance Prediction Models |
2017 |
CIDR |
5.7984657e-05 |
| 3,254 |
Query Processing on Tensor Computation Runtimes |
2022 |
VLDB |
7.3161051e-05 |
| 2,568 |
Towards Cost-Optimal Query Processing in the Cloud |
2021 |
VLDB |
8.5239227e-05 |
| 2,170 |
tf.data: A Machine Learning Data Processing Framework |
2021 |
VLDB |
9.3821603e-05 |
| 9,222 |
Towards an Optimized GROUP BY Abstraction for Large-Scale Machine Learning |
2021 |
VLDB |
4.3698672e-05 |
| 9,236 |
The Hopsworks Feature Store for Machine Learning |
2024 |
SIGMOD |
4.3690661e-05 |
| 1,876 |
ArrayStore: A Storage Manager for Complex Parallel Array Processing |
2011 |
SIGMOD |
0.00010239284 |
| 6,191 |
Automatic Optimization of Matrix Implementations for Distributed Machine Learning and Linear Algebra |
2021 |
SIGMOD |
5.1642282e-05 |
| 4,870 |
Exploiting Cloud Object Storage for High-Performance Analytics |
2023 |
VLDB |
5.8613885e-05 |