Bridging the Gap Between HPC and Big Data Frameworks
Summary: MPI-Spark integration to offload compute from Spark into MPI, preserving Spark's fault tolerance and ecosystem. Analyzes four distributed graph/ML workloads; shows 3.1-17.7x speedups over native Spark with overheads included, enabling reuse of MPI libraries in Spark with minimal effort. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Michael Anderson
- 2. Shaden Smith
- 3. Narayanan Sundaram
- 4. Mihai Capota
- 5. Zheguang Zhao
- 6. Subramanya Dulloor
- 7. Nadathur Satish
- 8. Theodore L. Willke
Incoming Citations (Sorted by Pagerank)
Showing 3 of 3 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 9,813 | Datalog with First-Class Facts | 2025 | VLDB | 4.2783272e-05 |
| 9,913 | Chukonu: A Fully-Featured High-Performance Big Data Framework that Integrates a Native Compute Engine into Spark | 2022 | VLDB | 4.2565279e-05 |
| 11,559 | Approximate Pattern Matching in Massive Graphs with Precision and Recall Guarantees | 2020 | SIGMOD | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 2 of 2 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,678 | Navigating the Maze of Graph Analytics Frameworks using Massive Graph Datasets | 2014 | SIGMOD | 0.00010933417 |
| 2,449 | GraphMat: High performance graph analytics made productive | 2015 | VLDB | 8.7915899e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 8,617 | A Spark Optimizer for Adaptive, Fine-Grained Parameter Tuning | 2024 | VLDB | 4.4846425e-05 |
| 13,349 | Trends and Challenges in Big Data Processing | 2016 | VLDB | - |
| 9,504 | Supporting Scalable Analytics with Latency Constraints | 2015 | VLDB | 4.3341665e-05 |
| 8,534 | Translation of Array-Based Loops to Distributed Data-Parallel Programs | 2020 | VLDB | 4.4937074e-05 |
| 3,200 | Big Data Analytics with Datalog Queries on Spark | 2016 | SIGMOD | 7.3912411e-05 |
| 7,032 | Building the Enterprise Fabric for Big Data with Vertica and Spark Integration | 2016 | SIGMOD | 4.8559744e-05 |
| 2,848 | Exploiting Matrix Dependency for Efficient Distributed Matrix Computation | 2015 | SIGMOD | 8.0208832e-05 |
| 8,221 | Enabling Transparent Acceleration of Big Data Frameworks Using Heterogeneous Hardware | 2022 | VLDB | 4.5556812e-05 |
| 4,437 | Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics | 2015 | VLDB | 6.1907793e-05 |
| 3,535 | Scaling Spark in the Real World: Performance and Usability | 2015 | VLDB | 6.9992495e-05 |