A Comparison of Platforms for Implementing and Running Very Large Scale Machine Learning Algorithms
Summary: A comparative benchmark of four platforms for large-scale ML inference across five hierarchical-model tasks. Uses 70,000 EC2 hours to compare runtimes, tuning, and programming effort, highlighting data-management tradeoffs for DB researchers. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Zhuhua Cai
- 2. Zekai J. Gao
- 3. Shangyu Luo
- 4. Luis L. Perez
- 5. Zografoula Vagena
- 6. Christopher Jermaine
Incoming Citations (Sorted by Pagerank)
Showing 7 of 7 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 396 | One Trillion Edges: Graph Processing at Facebook-Scale | 2015 | VLDB | 0.00024424102 |
| 834 | Learning Linear Regression Models over Factorized Joins | 2016 | SIGMOD | 0.00016135159 |
| 1,279 | Towards Linear Algebra over Normalized Data | 2017 | VLDB | 0.00012868394 |
| 1,532 | Data Management in Machine Learning: Challenges, Techniques, and Systems | 2017 | SIGMOD | 0.00011472681 |
| 3,948 | A Comparative Evaluation of Systems for Scalable Linear Algebra-based Analytics | 2018 | VLDB | 6.5959084e-05 |
| 6,322 | The BUDS Language for Distributed Bayesian Machine Learning | 2017 | SIGMOD | 5.1124615e-05 |
| 9,332 | PlinyCompute: A Platform for High-Performance, Distributed, Data-Intensive Tool Development | 2018 | SIGMOD | 4.3556432e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 7 of 7 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3 | Pig Latin: A Not-So-Foreign Language for Data Processing | 2008 | SIGMOD | 0.0024183614 |
| 4 | Pregel: A System for Large-Scale Graph Processing | 2010 | SIGMOD | 0.0019005923 |
| 140 | The MADlib Analytics Library or MAD Skills, the SQL | 2012 | VLDB | 0.00042270404 |
| 543 | MLbase: A Distributed Machine-learning System | 2013 | CIDR | 0.00020526854 |
| 1,158 | Simulation of Database-Valued Markov Chains Using SimSQL | 2013 | SIGMOD | 0.0001361064 |
| 1,372 | SQLEM: Fast Clustering in SQL using the EM Algorithm | 2000 | SIGMOD | 0.00012318334 |
| 1,495 | Ricardo: Integrating R and Hadoop | 2010 | SIGMOD | 0.00011691049 |
Previous
Page 1 / 1
Next