Magnus: A Holistic Approach to Data Management for Large-Scale Machine Learning Workloads
Summary: Magnus is a holistic data-management layer on Apache Iceberg tailored to large-scale ML workloads (wide tables, multimodal) combining resource-efficient storage formats with built-in vector/inverted indexes to speed retrieval. It adds scalable Git-like metadata branching, lightweight merge-on-read upsert, and native LRM/LMM training support; deployed at ByteDance with substantial real-world gains. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Authors
- 1. Jun Song
- 2. Jingyi Ding
- 3. Irshad Kandy
- 4. Yanghao Lin
- 5. Zhongjia Wei
- 6. Zilong Zhou
- 7. Zhiwei Peng
- 8. Jixi Shan
- 9. Hongyue Mao
- 10. Xiuqi Huang
- 11. Xun Song
- 12. Cheng Chen
- 13. Yanjia Li
- 14. Tianhao Yang
- 15. Wei Jia
- 16. Xiaohong Dong
- 17. Kang Lei
- 18. Rui Shi
- 19. Pengwei Zhao
- 20. Wei Chen
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 14 of 14 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 8,637 | Machine Learning for Data Management: Problems and Solutions | 2018 | SIGMOD | 4.479892e-05 |
| 7,020 | LLM for Data Management | 2024 | VLDB | 4.8595728e-05 |
| 7,375 | BG3: A Cost Effective and I/O Efficient Graph Database in ByteDance | 2024 | SIGMOD | 4.7491278e-05 |
| 4,549 | Database-Agnostic Workload Management | 2019 | CIDR | 6.0926728e-05 |
| 495 | Milvus: A Purpose-Built Vector Data Management System | 2021 | SIGMOD | 0.00021767688 |
| 13,171 | Reimagining Deep Learning Systems Through the Lens of Data Systems | 2024 | VLDB | - |
| 9,236 | The Hopsworks Feature Store for Machine Learning | 2024 | SIGMOD | 4.3690661e-05 |
| 7,411 | ItemSuggest: A Data Management Platform for Machine Learned Ranking Services | 2019 | CIDR | 4.7364436e-05 |
| 4,003 | Data Platform for Machine Learning | 2019 | SIGMOD | 6.54347e-05 |
| 1,532 | Data Management in Machine Learning: Challenges, Techniques, and Systems | 2017 | SIGMOD | 0.00011472681 |