Database Paper Browser

Back to papers

HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework

Summary: HET scales huge embedding training with a cache-enabled distributed framework that exploits skewed popularity. Embedding-level consistency with write-time staleness enables cache coherence, yielding up to 88% comms reduction and 20.68x speedup. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
12793
Venue
VLDB
Year
2022
Pagerank
8.3268401e-05
Overall Rank
2,677 | 81.38%
DOI
10.14778/3489496.3489511

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 22 of 22 citing papers.

Rank Citing Paper Year Venue Pagerank
3,506 BlindFL: Vertical Federated Machine Learning without Peeking into Your Data 2022 SIGMOD 7.0291192e-05
4,047 Orca: Scalable Temporal Graph Neural Network Training with Theoretical Guarantees 2023 SIGMOD 6.4972105e-05
5,018 DGC: Training Dynamic Graphs with Spatio-Temporal Non-Uniformity using Graph Partitioning by Chunks 2023 SIGMOD 5.7567672e-05
5,052 HET-GMP: A Graph-based System Approach to Scaling Large Embedding Model Training 2022 SIGMOD 5.7337977e-05
5,345 NeutronStream: A Dynamic GNN Training Framework with Sliding Window for Graph Streams 2024 VLDB 5.5567697e-05
6,377 Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism 2023 VLDB 5.0911095e-05
6,998 PetPS: Supporting Huge Embedding Models with Persistent Memory 2023 VLDB 4.8676312e-05
7,536 Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent 2023 VLDB 4.7176331e-05
8,126 SDPipe: A Semi-Decentralized Framework for Heterogeneity-aware Pipeline-parallel Training 2023 VLDB 4.5796615e-05
8,439 Accelerating Graph Indexing for ANNS on Modern CPUs 2025 SIGMOD 4.5128946e-05
8,737 Scheduling Data Processing Pipelines for Incremental Training on MLP-based Recommendation Models 2025 SIGMOD 4.456315e-05
8,808 FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement 2023 SIGMOD 4.4454035e-05
9,094 FEC: Efficient Deep Recommendation Model Training with Flexible Embedding Communication 2023 SIGMOD 4.3980444e-05
9,402 CAFE: Towards Compact, Adaptive, and Fast Embedding for Large-scale Recommendation Models 2024 SIGMOD 4.3441378e-05
9,408 Experimental Analysis of Large-scale Learnable Vector Storage Compression 2024 VLDB 4.3441378e-05
9,596 Scalable Graph Convolutional Network Training on Distributed-Memory Systems 2023 VLDB 4.319218e-05
9,677 Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving 2025 SIGMOD 4.3047774e-05
9,805 MEMO: Fine-grained Tensor Management For Ultra-long Context LLM Training 2025 SIGMOD 4.2805224e-05
9,966 Towards Communication-efficient Vertical Federated Learning Training via Cache-enabled Local Updates 2022 VLDB 4.2269436e-05
10,011 A Comprehensive Benchmark on Spectral GNNs: The Impact on Efficiency, Memory, and Effectiveness 2026 SIGMOD 4.1945683e-05
10,974 GE2: A General and Efficient Knowledge Graph Embedding Learning System 2024 SIGMOD 4.1945683e-05
11,265 EmbedX: A Versatile, Efficient and Scalable Platform to Embed Both Graphs and High-Dimensional Sparse Data 2023 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 6 of 6 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers