Database Paper Browser

Back to papers

Data Warehousing and Analytics Infrastructure at Facebook

Summary: Facebook's scalable data warehouse uses Scribe, Hadoop, Hive to unify log collection, storage, and analytics for BI dashboards and feature services. Stores >15PB (2.5PB compressed), ingests ~60TB/day (10TB compressed); discusses design choices, day-to-day operations, and planned improvements. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
4314
Venue
SIGMOD
Year
2010
Pagerank
8.3607429e-05
Overall Rank
2,658 | 81.52%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 17 of 17 citing papers.

Rank Citing Paper Year Venue Pagerank
1,613 Realtime Data Processing at Facebook 2016 SIGMOD 0.00011140777
1,814 Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing 2014 VLDB 0.00010458107
2,205 ReStore: Reusing Results of MapReduce Jobs 2012 VLDB 9.2920002e-05
2,928 WANalytics: Analytics for a Geo-Distributed Data-Intensive World 2015 CIDR 7.8812874e-05
4,572 The Unified Logging Infrastructure for Data Analytics at Twitter 2012 VLDB 6.0760183e-05
4,857 The "Big Data" Ecosystem at LinkedIn 2013 SIGMOD 5.8736144e-05
5,105 Only Aggressive Elephants are Fast Elephants 2012 VLDB 5.694494e-05
6,131 Fast Data in the Era of Big Data: Twitter's Real-Time Related Query Suggestion Architecture 2013 SIGMOD 5.1956688e-05
6,261 The Cosmos Big Data Platform at Microsoft: Over a Decade of Progress and a Decade to Look Forward 2021 VLDB 5.1350714e-05
6,282 Cheetah: Accelerating Database Queries with Switch Pruning 2020 SIGMOD 5.128797e-05
6,856 Liquid: Unifying Nearline and Offline Big Data Integration 2015 CIDR 4.9060615e-05
7,324 Compliant Geo-distributed Query Processing 2021 SIGMOD 4.762032e-05
7,998 Data Management for Social Networking 2016 PODS 4.6101889e-05
9,375 Efficient Big Data Processing in Hadoop MapReduce 2012 VLDB 4.347384e-05
10,419 Unified Lineage System: Tracking Data Provenance at Scale 2025 SIGMOD 4.1945683e-05
11,849 The Challenges of Global-scale Data Management 2016 SIGMOD 4.1945683e-05
11,958 Shared Execution of Recurring Workloads in MapReduce 2015 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 1 of 1 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
80 Weaving Relations for Cache Performance 2001 VLDB 0.00055721729
Previous Page 1 / 1 Next

Semantically Similar Papers