LDA*: A Robust and Large-scale Topic Modeling System
Summary: Systematic study of samplers (AliasLDA, F+LDA, LightLDA, WarpLDA) with a hybrid, document-length–aware approach for robust, large-scale topic modeling. Asymmetric parameter-server architecture shifts computation to the server, reduces communication bottlenecks in large deployments, delivering up to 10x gains over prior systems. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Authors
- 1. Lele Yu
- 2. Ce Zhang
- 3. Yingxia Shao
- 4. Bin Cui
Incoming Citations (Sorted by Pagerank)
Showing 5 of 5 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 2,677 | HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework | 2022 | VLDB | 8.3268401e-05 |
| 3,808 | SketchML: Accelerating Distributed Machine Learning with Data Sketches | 2018 | SIGMOD | 6.7455428e-05 |
| 4,964 | PS2: Parameter Server on Spark | 2019 | SIGMOD | 5.7965988e-05 |
| 5,052 | HET-GMP: A Graph-based System Approach to Scaling Large Embedding Model Training | 2022 | SIGMOD | 5.7337977e-05 |
| 9,469 | DimBoost: Boosting Gradient Boosting Decision Tree to Higher Dimensions | 2018 | SIGMOD | 4.3342363e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 3 of 3 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 328 | An Architecture for Parallel Topic Models | 2010 | VLDB | 0.0002728514 |
| 1,942 | Heterogeneity-aware Distributed Parameter Servers | 2017 | SIGMOD | 0.00010012691 |
| 6,014 | WarpLDA: a Cache Efficient O(1) Algorithm for Latent Dirichlet Allocation | 2016 | VLDB | 5.2415551e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,520 | Towards the Web of Concepts: Extracting Concepts from Large Datasets | 2010 | VLDB | 5.4614656e-05 |
| 8,027 | Diversity-Aware Top-k Publish/Subscribe for Text Stream | 2015 | SIGMOD | 4.6029624e-05 |
| 13,171 | Reimagining Deep Learning Systems Through the Lens of Data Systems | 2024 | VLDB | - |
| 7,890 | Mining a Search Engine’s Corpus: Efficient Yet Unbiased Sampling and Aggregate Estimation | 2011 | SIGMOD | 4.6249533e-05 |
| 428 | Latent Semantic Indexing: A Probabilistic Analysis | 1998 | PODS | 0.00023512226 |
| 11,834 | Topic Exploration in Spatio-Temporal Document Collections | 2016 | SIGMOD | 4.1945683e-05 |
| 11,954 | Scalable Topical Phrase Mining from Text Corpora | 2015 | VLDB | 4.1945683e-05 |
| 6,014 | WarpLDA: a Cache Efficient O(1) Algorithm for Latent Dirichlet Allocation | 2016 | VLDB | 5.2415551e-05 |
| 328 | An Architecture for Parallel Topic Models | 2010 | VLDB | 0.0002728514 |
| 13,328 | Scalable Training of Hierarchical Topic Models | 2018 | VLDB | - |