sPCA: Scalable Principal Component Analysis for Big Data on Distributed Platforms
Summary: Introduces sPCA, a scalable PCA optimized for distributed big-data platforms. Leverages sparse matrix ops, minimizes intermediates, and is implemented on MapReduce and Spark; outperforms Mahout-PCA and MLlib-PCA in accuracy, speed, and data-shuffle. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Tarek Elgamal
- 2. Maysam Yabandeh
- 3. Ashraf Aboulnaga
- 4. Waleed Mustafa
- 5. Mohamed Hefeeda
Incoming Citations (Sorted by Pagerank)
Showing 1 of 1 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 11,041 | QCore: Data-Efficient, On-Device Continual Calibration for Quantized Models | 2024 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 3 of 3 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 413 | HaLoop: Efficient Iterative Data Processing on Large Clusters | 2010 | VLDB | 0.00023904409 |
| 543 | MLbase: A Distributed Machine-learning System | 2013 | CIDR | 0.00020526854 |
| 1,876 | ArrayStore: A Storage Manager for Complex Parallel Array Processing | 2011 | SIGMOD | 0.00010239284 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 13,343 | M3: Scaling Up Machine Learning via Memory Mapping | 2016 | SIGMOD | - |
| 2,674 | Minimal MapReduce Algorithms | 2013 | SIGMOD | 8.3328645e-05 |
| 7,019 | Bridging the Gap Between HPC and Big Data Frameworks | 2017 | VLDB | 4.860057e-05 |
| 4,437 | Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics | 2015 | VLDB | 6.1907793e-05 |
| 2,476 | A Platform for Scalable One-Pass Analytics using MapReduce | 2011 | SIGMOD | 8.6960139e-05 |
| 11,423 | Scalable Robust Graph Embedding with Spark | 2022 | VLDB | 4.1945683e-05 |
| 3,129 | Scalable Big Graph Processing in MapReduce | 2014 | SIGMOD | 7.5008242e-05 |
| 2,848 | Exploiting Matrix Dependency for Efficient Distributed Matrix Computation | 2015 | SIGMOD | 8.0208832e-05 |
| 11,835 | An Efficient MapReduce Cube Algorithm for Varied Data Distributions | 2016 | SIGMOD | 4.1945683e-05 |
| 7,949 | Efficient Matrix Sketching over Distributed Data | 2017 | PODS | 4.613363e-05 |