Database Paper Browser

Back to papers

MillWheel: Fault-Tolerant Stream Processing at Internet Scale

Summary: MillWheel offers fault-tolerant, low-latency stream processing at Internet scale via a directed computation graph, persistent state, and continuous dataflow. Logical time-based aggregations, scalable fault tolerance, and a Google case study (anomaly detector) illustrate its unique programming model and broad applicability to data-intensive, real-time analytics. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
10541
Venue
VLDB
Year
2013
Pagerank
0.00028084774
Overall Rank
314 | 97.82%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 50 of 66 citing papers.

Rank Citing Paper Year Venue Pagerank
288 Storm @Twitter 2014 SIGMOD 0.00028939871
538 The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing 2015 VLDB 0.00020678804
824 Twitter Heron: Stream Processing at Scale 2015 SIGMOD 0.0001623129
1,084 Dhalion: Self-Regulating Stream Processing in Heron 2017 VLDB 0.00014209714
1,098 Trill: A High-Performance Incremental Query Processor for Diverse Analytics 2015 VLDB 0.00014114442
1,613 Realtime Data Processing at Facebook 2016 SIGMOD 0.00011140777
1,794 Summingbird: A Framework for Integrating Batch and Online MapReduce Computations 2014 VLDB 0.00010532024
2,264 S-Store: Streaming Meets Transaction Processing 2015 VLDB 9.1575142e-05
2,338 Samza: Stateful Scalable Stream Processing at LinkedIn 2017 VLDB 9.00711e-05
2,853 Timon: A Timestamped Event Database for Efficient Telemetry Data Processing and Analytics 2020 SIGMOD 8.0108722e-05
3,051 Partial Results in Database Systems 2014 SIGMOD 7.6512591e-05
3,210 Frontier: Resilient Edge Processing for the Internet of Things 2018 VLDB 7.3746627e-05
3,333 SnappyData: A Unified Cluster for Streaming, Transactions, and Interactive Analytics 2017 CIDR 7.2093479e-05
3,378 General Incremental Sliding-Window Aggregation 2015 VLDB 7.1622572e-05
3,550 Chi: A Scalable and Programmable Control Plane for Distributed Stream Processing Systems 2018 VLDB 6.9843512e-05
3,704 How to Win a Hot Dog Eating Contest: Distributed Incremental View Maintenance with Batch Updates 2016 SIGMOD 6.827494e-05
3,762 SABER: Window-Based Hybrid Stream Processing for Heterogeneous Architectures 2016 SIGMOD 6.7804471e-05
4,044 Megaphone: Latency-conscious state migration for distributed streaming dataflows 2019 VLDB 6.4995312e-05
4,120 Husky: Towards a More Efficient and Expressive Distributed Computing Framework 2016 VLDB 6.4364588e-05
4,488 Analyzing Efficient Stream Processing on Modern Hardware 2019 VLDB 6.145117e-05
4,795 Rhino: Efficient Management of Very Large Distributed State for Stream Processing Engines 2020 SIGMOD 5.9158043e-05
4,822 Consistency and Completeness: Rethinking Distributed Stream Processing in Apache Kafka 2021 SIGMOD 5.8959131e-05
5,130 One SQL to Rule Them All – an Efficient and Syntactically Idiomatic Approach to Management of Streams and Tables 2019 SIGMOD 5.6755067e-05
5,193 LightSaber: Efficient Window Aggregation on Multi-core Processors 2020 SIGMOD 5.6371049e-05
5,211 Tornado: A System For Real-Time Iterative Analysis Over Evolving Data 2016 SIGMOD 5.6284829e-05
5,263 Consistent Regions: Guaranteed Tuple Processing in IBM Streams 2016 VLDB 5.5976361e-05
5,286 StreamOps: Cloud-Native Runtime Management for Streaming Services in ByteDance 2023 VLDB 5.5838392e-05
5,939 Clonos: Consistent Causal Recovery for Highly-Available Streaming Dataflows 2021 SIGMOD 5.2641681e-05
5,971 Optimal and General Out-of-Order Sliding-Window Aggregation 2019 VLDB 5.2480159e-05
6,109 Pixida: Optimizing Data Parallel Jobs in Wide-Area Data Analytics 2016 VLDB 5.2059441e-05
6,242 Helios: Hyperscale Indexing for the Cloud & Edge 2020 VLDB 5.1408379e-05
6,436 Providing Streaming Joins as a Service at Facebook 2018 VLDB 5.0636254e-05
6,648 Grizzly: Efficient Stream Processing Through Adaptive Query Compilation 2020 SIGMOD 4.9771723e-05
6,721 Beyond Analytics: The Evolution of Stream Processing Systems 2020 SIGMOD 4.9492015e-05
6,767 Watermarks in Stream Processing Systems: Semantics and Comparative Analysis of Apache Flink and Google Cloud Dataflow 2021 VLDB 4.9322174e-05
7,372 Model-Free Control for Distributed Stream Data Processing using Deep Reinforcement Learning 2018 VLDB 4.7496881e-05
7,373 Hazelcast Jet: Low-latency Stream Processing at the 99.99th Percentile 2021 VLDB 4.7494183e-05
7,534 Enabling Efficient and General Subpopulation Analytics in Multidimensional Data Streams 2022 VLDB 4.7180004e-05
7,599 Quill: Efficient, Transferable, and Rich Analytics at Scale 2016 VLDB 4.7003593e-05
7,627 Incremental Sliding Window Connectivity over Streaming Graphs 2024 VLDB 4.6928167e-05
7,710 Ananke: A Streaming Framework for Live Forward Provenance 2021 VLDB 4.6719822e-05
8,001 Rethinking Stateful Stream Processing with RDMA 2022 SIGMOD 4.6092573e-05
8,611 Efficient Incrementialization of Correlated Nested Aggregate Queries using Relative Partial Aggregate Indexes (RPAI) 2022 SIGMOD 4.4852886e-05
8,746 Texera: A System for Collaborative and Interactive Data Analytics Using Workflows 2024 VLDB 4.456315e-05
8,909 What's the Difference? Incremental Processing with Change Queries in Snowflake 2023 SIGMOD 4.427232e-05
9,217 Elasticutor: Rapid Elasticity for Realtime Stateful Stream Processing 2019 SIGMOD 4.3712054e-05
9,302 Shrink – Prescribing Resiliency Solutions for Streaming 2017 VLDB 4.3587156e-05
9,318 Disaggregated State Management in Apache Flink® 2.0 2025 VLDB 4.3556432e-05
9,496 Scabbard: Single-Node Fault-Tolerant Stream Processing 2022 VLDB 4.3341665e-05
9,504 Supporting Scalable Analytics with Latency Constraints 2015 VLDB 4.3341665e-05
Previous Page 1 / 2 Next

Outgoing Citations (Sorted by Pagerank)

Showing 10 of 10 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers