The "Big Data" Ecosystem at LinkedIn
Summary: LinkedIn's Hadoop-based analytics stack abstracts distributed systems for data scientists, enabling end-to-end analytics on massive data. Novelty: seamless last-mile integration—ingress/egress to online systems and production workflows—via a 1-line Pig command, with use cases in recommendations and feeds. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Roshan Sumbaly
- 2. Jay Kreps
- 3. Sam Shah
Incoming Citations (Sorted by Pagerank)
Showing 6 of 6 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,666 | HELIX: Holistic Optimization for Accelerating Iterative Machine Learning | 2019 | VLDB | 0.0001096361 |
| 6,261 | The Cosmos Big Data Platform at Microsoft: Over a Decade of Progress and a Decade to Look Forward | 2021 | VLDB | 5.1350714e-05 |
| 7,998 | Data Management for Social Networking | 2016 | PODS | 4.6101889e-05 |
| 8,078 | Meta-Dataflows: Efficient Exploratory Dataflow Jobs | 2018 | SIGMOD | 4.5914967e-05 |
| 9,266 | Redoop Infrastructure for Recurring Big Data Queries | 2014 | VLDB | 4.3667196e-05 |
| 11,958 | Shared Execution of Recurring Workloads in MapReduce | 2015 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 10 of 10 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3 | Pig Latin: A Not-So-Foreign Language for Data Processing | 2008 | SIGMOD | 0.0024183614 |
| 13 | Mining Association Rules between Sets of Items in Large Databases | 1993 | SIGMOD | 0.0010864752 |
| 1,499 | Apache Hadoop Goes Realtime at Facebook | 2011 | SIGMOD | 0.00011675192 |
| 2,658 | Data Warehousing and Analytics Infrastructure at Facebook | 2010 | SIGMOD | 8.3607429e-05 |
| 3,434 | Efficient Bulk Insertion into a Distributed Ordered Table | 2008 | SIGMOD | 7.0994919e-05 |
| 3,601 | Large-Scale Machine Learning at Twitter | 2012 | SIGMOD | 6.9315087e-05 |
| 4,414 | Efficient Type-Ahead Search on Relational Data: a TASTIER Approach | 2009 | SIGMOD | 6.2056993e-05 |
| 4,572 | The Unified Logging Infrastructure for Data Analytics at Twitter | 2012 | VLDB | 6.0760183e-05 |
| 7,570 | Avatara: OLAP for Web-scale Analytics Products | 2012 | VLDB | 4.7077095e-05 |
| 8,413 | A Batch of PNUTS: Experiences Connecting Cloud Batch and Serving Systems | 2011 | SIGMOD | 4.5203012e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 9,504 | Supporting Scalable Analytics with Latency Constraints | 2015 | VLDB | 4.3341665e-05 |
| 5,838 | HadoopDB in Action: Building Real World Applications | 2010 | SIGMOD | 5.3059032e-05 |
| 6,821 | Hadoop's Adolescence: An analysis of Hadoop usage in scientific workloads | 2013 | VLDB | 4.9156923e-05 |
| 780 | Building a High-Level Dataflow System on top of Map-Reduce: The Pig Experience | 2009 | VLDB | 0.00016775082 |
| 2,476 | A Platform for Scalable One-Pass Analytics using MapReduce | 2011 | SIGMOD | 8.6960139e-05 |
| 157 | HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads | 2009 | VLDB | 0.00040397359 |
| 7,877 | Emerging Trends in the Enterprise Data Analytics: Connecting Hadoop and DB2 Warehouse | 2011 | SIGMOD | 4.6297559e-05 |
| 8,926 | Gobblin: Unifying Data Ingestion for Hadoop | 2015 | VLDB | 4.427232e-05 |
| 2,658 | Data Warehousing and Analytics Infrastructure at Facebook | 2010 | SIGMOD | 8.3607429e-05 |
| 3,601 | Large-Scale Machine Learning at Twitter | 2012 | SIGMOD | 6.9315087e-05 |