Back to papers
Samza: Stateful Scalable Stream Processing at LinkedIn
Summary: Samza enables stateful, scalable stream processing with partitioned local state and a changelog for rapid host-affinity recovery. Supports finite data as streams from Kafka, Databus, or HDFS without code changes, enabling flows and linear scaling.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 11446
- Venue
- VLDB
- Year
- 2017
- Pagerank
- 9.00711e-05
- Overall Rank
- 2,338 | 83.74%
- DOI
-
-
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 23 of 23 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 2,853 |
Timon: A Timestamped Event Database for Efficient Telemetry Data Processing and Analytics |
2020 |
SIGMOD |
8.0108722e-05 |
| 4,795 |
Rhino: Efficient Management of Very Large Distributed State for Stream Processing Engines |
2020 |
SIGMOD |
5.9158043e-05 |
| 4,822 |
Consistency and Completeness: Rethinking Distributed Stream Processing in Apache Kafka |
2021 |
SIGMOD |
5.8959131e-05 |
| 5,939 |
Clonos: Consistent Causal Recovery for Highly-Available Streaming Dataflows |
2021 |
SIGMOD |
5.2641681e-05 |
| 6,436 |
Providing Streaming Joins as a Service at Facebook |
2018 |
VLDB |
5.0636254e-05 |
| 6,721 |
Beyond Analytics: The Evolution of Stream Processing Systems |
2020 |
SIGMOD |
4.9492015e-05 |
| 6,871 |
Towards General and Efficient Online Tuning for Spark |
2023 |
VLDB |
4.8997004e-05 |
| 6,988 |
CrocodileDB: Efficient Database Execution through Intelligent Deferment |
2020 |
CIDR |
4.8718019e-05 |
| 7,234 |
MgCrab: Transaction Crabbing for Live Migration in Deterministic Database Systems |
2019 |
VLDB |
4.7941449e-05 |
| 7,373 |
Hazelcast Jet: Low-latency Stream Processing at the 99.99th Percentile |
2021 |
VLDB |
4.7494183e-05 |
| 7,938 |
Correctness in Stream Processing: Challenges and Opportunities |
2022 |
CIDR |
4.613363e-05 |
| 8,217 |
Spur: Mitigating Slow Instances in Large-Scale Streaming Pipelines |
2020 |
SIGMOD |
4.5568298e-05 |
| 8,909 |
What's the Difference? Incremental Processing with Change Queries in Snowflake |
2023 |
SIGMOD |
4.427232e-05 |
| 9,217 |
Elasticutor: Rapid Elasticity for Realtime Stateful Stream Processing |
2019 |
SIGMOD |
4.3712054e-05 |
| 9,318 |
Disaggregated State Management in Apache Flink® 2.0 |
2025 |
VLDB |
4.3556432e-05 |
| 9,604 |
GeaFlow: A Graph Extended and Accelerated Dataflow System |
2023 |
SIGMOD |
4.3177432e-05 |
| 9,733 |
ContTune: Continuous Tuning by Conservative Bayesian Optimization for Distributed Stream Data Processing Systems |
2023 |
VLDB |
4.2942813e-05 |
| 9,803 |
Railgun: managing large streaming windows under MAD requirements |
2021 |
VLDB |
4.2807806e-05 |
| 10,509 |
Styx: Transactional Stateful Functions on Streaming Dataflows |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,962 |
Fault Tolerance Placement in the Internet of Things |
2024 |
SIGMOD |
4.1945683e-05 |
| 11,435 |
Synchronization Schemas |
2021 |
PODS |
4.1945683e-05 |
| 11,673 |
Online Template Induction for Machine-Generated Emails |
2019 |
VLDB |
4.1945683e-05 |
| 11,728 |
Challenges and Experiences in Building an Efficient Apache Beam Runner For IBM Streams |
2018 |
VLDB |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 15 of 15 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 3 |
Pig Latin: A Not-So-Foreign Language for Data Processing |
2008 |
SIGMOD |
0.0024183614 |
| 70 |
Hive - A Warehousing Solution Over a Map-Reduce Framework |
2009 |
VLDB |
0.00059533166 |
| 191 |
The Design of the Borealis Stream Processing Engine |
2005 |
CIDR |
0.00035738595 |
| 288 |
Storm @Twitter |
2014 |
SIGMOD |
0.00028939871 |
| 314 |
MillWheel: Fault-Tolerant Stream Processing at Internet Scale |
2013 |
VLDB |
0.00028084774 |
| 538 |
The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing |
2015 |
VLDB |
0.00020678804 |
| 824 |
Twitter Heron: Stream Processing at Scale |
2015 |
SIGMOD |
0.0001623129 |
| 1,226 |
Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management |
2013 |
SIGMOD |
0.00013180799 |
| 1,853 |
On Brewing Fresh Espresso: LinkedIn’s Distributed Data Serving Platform |
2013 |
SIGMOD |
0.00010320369 |
| 2,264 |
S-Store: Streaming Meets Transaction Processing |
2015 |
VLDB |
9.1575142e-05 |
| 3,556 |
Solving Big Data Challenges for Enterprise Application Performance Management |
2012 |
VLDB |
6.9770145e-05 |
| 5,049 |
Run-Time Operator State Spilling for Memory Intensive Long-Running Queries |
2006 |
SIGMOD |
5.7372423e-05 |
| 5,263 |
Consistent Regions: Guaranteed Tuple Processing in IBM Streams |
2016 |
VLDB |
5.5976361e-05 |
| 6,856 |
Liquid: Unifying Nearline and Offline Big Data Integration |
2015 |
CIDR |
4.9060615e-05 |
| 8,162 |
Ambry: LinkedIn's Scalable Geo-Distributed Object Store |
2016 |
SIGMOD |
4.5723648e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 10,862 |
How Reliable Are Streams? End-to-End Processing-Guarantee Validation and Performance Benchmarking of Stream Processing Systems |
2025 |
VLDB |
4.1945683e-05 |
| 10,789 |
Ursa: A Lakehouse-Native Data Streaming Engine for Kafka |
2025 |
VLDB |
4.1945683e-05 |
| 11,819 |
Toward High-Performance Distributed Stream Processing via Approximate Fault Tolerance |
2017 |
VLDB |
4.1945683e-05 |
| 6,856 |
Liquid: Unifying Nearline and Offline Big Data Integration |
2015 |
CIDR |
4.9060615e-05 |
| 4,795 |
Rhino: Efficient Management of Very Large Distributed State for Stream Processing Engines |
2020 |
SIGMOD |
5.9158043e-05 |
| 5,753 |
Building a Replicated Logging System with Apache Kafka |
2015 |
VLDB |
5.3404371e-05 |
| 9,496 |
Scabbard: Single-Node Fault-Tolerant Stream Processing |
2022 |
VLDB |
4.3341665e-05 |
| 1,226 |
Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management |
2013 |
SIGMOD |
0.00013180799 |
| 4,822 |
Consistency and Completeness: Rethinking Distributed Stream Processing in Apache Kafka |
2021 |
SIGMOD |
5.8959131e-05 |
| 11,804 |
State Management in Apache Flink |
2017 |
VLDB |
4.1945683e-05 |