Database Paper Browser

Back to papers

Samza: Stateful Scalable Stream Processing at LinkedIn

Summary: Samza enables stateful, scalable stream processing with partitioned local state and a changelog for rapid host-affinity recovery. Supports finite data as streams from Kafka, Databus, or HDFS without code changes, enabling flows and linear scaling. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
11446
Venue
VLDB
Year
2017
Pagerank
9.00711e-05
Overall Rank
2,338 | 83.74%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 23 of 23 citing papers.

Rank Citing Paper Year Venue Pagerank
2,853 Timon: A Timestamped Event Database for Efficient Telemetry Data Processing and Analytics 2020 SIGMOD 8.0108722e-05
4,795 Rhino: Efficient Management of Very Large Distributed State for Stream Processing Engines 2020 SIGMOD 5.9158043e-05
4,822 Consistency and Completeness: Rethinking Distributed Stream Processing in Apache Kafka 2021 SIGMOD 5.8959131e-05
5,939 Clonos: Consistent Causal Recovery for Highly-Available Streaming Dataflows 2021 SIGMOD 5.2641681e-05
6,436 Providing Streaming Joins as a Service at Facebook 2018 VLDB 5.0636254e-05
6,721 Beyond Analytics: The Evolution of Stream Processing Systems 2020 SIGMOD 4.9492015e-05
6,871 Towards General and Efficient Online Tuning for Spark 2023 VLDB 4.8997004e-05
6,988 CrocodileDB: Efficient Database Execution through Intelligent Deferment 2020 CIDR 4.8718019e-05
7,234 MgCrab: Transaction Crabbing for Live Migration in Deterministic Database Systems 2019 VLDB 4.7941449e-05
7,373 Hazelcast Jet: Low-latency Stream Processing at the 99.99th Percentile 2021 VLDB 4.7494183e-05
7,938 Correctness in Stream Processing: Challenges and Opportunities 2022 CIDR 4.613363e-05
8,217 Spur: Mitigating Slow Instances in Large-Scale Streaming Pipelines 2020 SIGMOD 4.5568298e-05
8,909 What's the Difference? Incremental Processing with Change Queries in Snowflake 2023 SIGMOD 4.427232e-05
9,217 Elasticutor: Rapid Elasticity for Realtime Stateful Stream Processing 2019 SIGMOD 4.3712054e-05
9,318 Disaggregated State Management in Apache Flink® 2.0 2025 VLDB 4.3556432e-05
9,604 GeaFlow: A Graph Extended and Accelerated Dataflow System 2023 SIGMOD 4.3177432e-05
9,733 ContTune: Continuous Tuning by Conservative Bayesian Optimization for Distributed Stream Data Processing Systems 2023 VLDB 4.2942813e-05
9,803 Railgun: managing large streaming windows under MAD requirements 2021 VLDB 4.2807806e-05
10,509 Styx: Transactional Stateful Functions on Streaming Dataflows 2025 SIGMOD 4.1945683e-05
10,962 Fault Tolerance Placement in the Internet of Things 2024 SIGMOD 4.1945683e-05
11,435 Synchronization Schemas 2021 PODS 4.1945683e-05
11,673 Online Template Induction for Machine-Generated Emails 2019 VLDB 4.1945683e-05
11,728 Challenges and Experiences in Building an Efficient Apache Beam Runner For IBM Streams 2018 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 15 of 15 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers