Large-Scale Machine Learning at Twitter
Summary: Case study of integrating ML into Twitter's Hadoop/Pig analytics stack. Pig extensions enable supervised learning with online SGD and ensembles; ML tasks (sampling, feature generation, training, testing) run as Pig loaders/UDFs, making ML a first-class Pig script in production. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Jimmy Lin
- 2. Alek Kolcz
Incoming Citations (Sorted by Pagerank)
Showing 9 of 9 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,402 | Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML | 2014 | VLDB | 0.00012180605 |
| 1,794 | Summingbird: A Framework for Integrating Batch and Online MapReduce Computations | 2014 | VLDB | 0.00010532024 |
| 4,572 | The Unified Logging Infrastructure for Data Analytics at Twitter | 2012 | VLDB | 6.0760183e-05 |
| 4,857 | The "Big Data" Ecosystem at LinkedIn | 2013 | SIGMOD | 5.8736144e-05 |
| 4,885 | GraphJet: Real-Time Content Recommendations at Twitter | 2016 | VLDB | 5.8534354e-05 |
| 5,257 | Probabilistic Demand Forecasting at Scale | 2017 | VLDB | 5.6003925e-05 |
| 6,131 | Fast Data in the Era of Big Data: Twitter's Real-Time Related Query Suggestion Architecture | 2013 | SIGMOD | 5.1956688e-05 |
| 8,928 | Tripartite Graph Clustering for Dynamic Sentiment Analysis on Social Media | 2014 | SIGMOD | 4.427232e-05 |
| 12,101 | Optimization Strategies for A/B Testing on HADOOP | 2013 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 7 of 7 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3 | Pig Latin: A Not-So-Foreign Language for Data Processing | 2008 | SIGMOD | 0.0024183614 |
| 157 | HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads | 2009 | VLDB | 0.00040397359 |
| 168 | MAD Skills: New Analysis Practices for Big Data | 2009 | VLDB | 0.00038946305 |
| 328 | An Architecture for Parallel Topic Models | 2010 | VLDB | 0.0002728514 |
| 780 | Building a High-Level Dataflow System on top of Map-Reduce: The Pig Experience | 2009 | VLDB | 0.00016775082 |
| 2,337 | Efficient Processing of Data Warehousing Queries in a Split Execution Environment | 2011 | SIGMOD | 9.0098186e-05 |
| 3,115 | Llama: Leveraging Columnar Storage for Scalable Join Processing in the MapReduce Framework | 2011 | SIGMOD | 7.543505e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 2,476 | A Platform for Scalable One-Pass Analytics using MapReduce | 2011 | SIGMOD | 8.6960139e-05 |
| 9,504 | Supporting Scalable Analytics with Latency Constraints | 2015 | VLDB | 4.3341665e-05 |
| 1,402 | Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML | 2014 | VLDB | 0.00012180605 |
| 6,131 | Fast Data in the Era of Big Data: Twitter's Real-Time Related Query Suggestion Architecture | 2013 | SIGMOD | 5.1956688e-05 |
| 4,906 | Machine Learning for Big Data | 2013 | SIGMOD | 5.8389053e-05 |
| 2,658 | Data Warehousing and Analytics Infrastructure at Facebook | 2010 | SIGMOD | 8.3607429e-05 |
| 3 | Pig Latin: A Not-So-Foreign Language for Data Processing | 2008 | SIGMOD | 0.0024183614 |
| 4,572 | The Unified Logging Infrastructure for Data Analytics at Twitter | 2012 | VLDB | 6.0760183e-05 |
| 780 | Building a High-Level Dataflow System on top of Map-Reduce: The Pig Experience | 2009 | VLDB | 0.00016775082 |
| 4,857 | The "Big Data" Ecosystem at LinkedIn | 2013 | SIGMOD | 5.8736144e-05 |