Database Paper Browser

Back to papers

The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing

Summary: Proposes Dataflow Model, an abstraction for unbounded streams, enabling event-time processing with tunable correctness, latency, and cost. Advocates ongoing arrival and possible retractions, with formal semantics, core principles, and validation. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
11057
Venue
VLDB
Year
2015
Pagerank
0.00020678804
Overall Rank
538 | 96.26%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 50 of 66 citing papers.

Rank Citing Paper Year Venue Pagerank
1,548 Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark 2018 SIGMOD 0.00011431383
1,613 Realtime Data Processing at Facebook 2016 SIGMOD 0.00011140777
2,338 Samza: Stateful Scalable Stream Processing at LinkedIn 2017 VLDB 9.00711e-05
2,853 Timon: A Timestamped Event Database for Efficient Telemetry Data Processing and Analytics 2020 SIGMOD 8.0108722e-05
3,333 SnappyData: A Unified Cluster for Streaming, Transactions, and Interactive Analytics 2017 CIDR 7.2093479e-05
3,355 F1 Query: Declarative Querying at Scale 2018 VLDB 7.1829142e-05
3,386 Lethe: A Tunable Delete-Aware LSM Engine 2020 SIGMOD 7.1577103e-05
3,550 Chi: A Scalable and Programmable Control Plane for Distributed Stream Processing Systems 2018 VLDB 6.9843512e-05
4,021 Parallel Algorithms for Constructing Range and Nearest-Neighbor Searching Data Structures 2016 PODS 6.5225987e-05
4,044 Megaphone: Latency-conscious state migration for distributed streaming dataflows 2019 VLDB 6.4995312e-05
4,488 Analyzing Efficient Stream Processing on Modern Hardware 2019 VLDB 6.145117e-05
4,795 Rhino: Efficient Management of Very Large Distributed State for Stream Processing Engines 2020 SIGMOD 5.9158043e-05
4,822 Consistency and Completeness: Rethinking Distributed Stream Processing in Apache Kafka 2021 SIGMOD 5.8959131e-05
5,130 One SQL to Rule Them All – an Efficient and Syntactically Idiomatic Approach to Management of Streams and Tables 2019 SIGMOD 5.6755067e-05
5,193 LightSaber: Efficient Window Aggregation on Multi-core Processors 2020 SIGMOD 5.6371049e-05
5,286 StreamOps: Cloud-Native Runtime Management for Streaming Services in ByteDance 2023 VLDB 5.5838392e-05
5,732 TcpRT: Instrument and Diagnostic Analysis System for Service Quality of Cloud Databases at Massive Scale in Real-time 2018 SIGMOD 5.3501728e-05
5,971 Optimal and General Out-of-Order Sliding-Window Aggregation 2019 VLDB 5.2480159e-05
6,242 Helios: Hyperscale Indexing for the Cloud & Edge 2020 VLDB 5.1408379e-05
6,436 Providing Streaming Joins as a Service at Facebook 2018 VLDB 5.0636254e-05
6,629 A Holistic View of Stream Partitioning Costs 2017 VLDB 4.9880986e-05
6,721 Beyond Analytics: The Evolution of Stream Processing Systems 2020 SIGMOD 4.9492015e-05
6,759 AStream: Ad-hoc Shared Stream Processing 2019 SIGMOD 4.9352213e-05
6,767 Watermarks in Stream Processing Systems: Semantics and Comparative Analysis of Apache Flink and Google Cloud Dataflow 2021 VLDB 4.9322174e-05
6,912 CYADB: A Database that Covers Your Ask 2018 VLDB 4.8925595e-05
7,373 Hazelcast Jet: Low-latency Stream Processing at the 99.99th Percentile 2021 VLDB 4.7494183e-05
7,407 Intermittent Query Processing 2019 VLDB 4.7373205e-05
7,599 Quill: Efficient, Transferable, and Rich Analytics at Scale 2016 VLDB 4.7003593e-05
7,710 Ananke: A Streaming Framework for Live Forward Provenance 2021 VLDB 4.6719822e-05
8,001 Rethinking Stateful Stream Processing with RDMA 2022 SIGMOD 4.6092573e-05
8,140 Erebus: Explaining the Outputs of Data Streaming Queries 2023 VLDB 4.5768015e-05
8,217 Spur: Mitigating Slow Instances in Large-Scale Streaming Pipelines 2020 SIGMOD 4.5568298e-05
8,596 Prompt: Dynamic Data-Partitioning for Distributed Micro-batch Stream Processing Systems 2020 SIGMOD 4.4887993e-05
8,987 Differentially Private Stream Processing at Scale* 2024 VLDB 4.4144429e-05
9,073 CrocodileDB in Action: Resource-Efficient Query Execution by Exploiting Time Slackness 2020 VLDB 4.4023079e-05
9,187 POLAR: Adaptive and Non-invasive Join Order Selection via Plans of Least Resistance 2024 VLDB 4.3780059e-05
9,217 Elasticutor: Rapid Elasticity for Realtime Stateful Stream Processing 2019 SIGMOD 4.3712054e-05
9,318 Disaggregated State Management in Apache Flink® 2.0 2025 VLDB 4.3556432e-05
9,401 Vortex: A Stream-oriented Storage Engine For Big Data Analytics 2024 SIGMOD 4.3441378e-05
9,604 GeaFlow: A Graph Extended and Accelerated Dataflow System 2023 SIGMOD 4.3177432e-05
9,797 Dalton: Learned Partitioning for Distributed Data Streams 2023 VLDB 4.2818172e-05
9,803 Railgun: managing large streaming windows under MAD requirements 2021 VLDB 4.2807806e-05
9,881 VStream: A Distributed Streaming Vector Search System 2025 VLDB 4.2643674e-05
10,077 Enjima: A Resource-Adaptive Stream Processing System 2026 SIGMOD 4.1945683e-05
10,183 Mixtera: A Data Plane for Foundation Model Training 2026 SIGMOD 4.1945683e-05
10,259 Scarf: Self-Adaptive Tuning via Multi-Objective Reinforcement Learning for Apache Flink 2026 VLDB 4.1945683e-05
10,410 Oceanus: Enable SLO-Aware Vertical Autoscaling for Cloud-Native Streaming Services in Tencent 2025 SIGMOD 4.1945683e-05
10,417 Streaming Democratized: Ease Across the Latency Spectrum with Delayed View Semantics and Snowflake Dynamic Tables 2025 SIGMOD 4.1945683e-05
10,616 Unraveling the Impact of Window Semantics: Optimizing Join Order for Efficient Stream Processing 2025 VLDB 4.1945683e-05
10,766 Scribe: How Meta transports terabytes per second in real time 2025 VLDB 4.1945683e-05
Previous Page 1 / 2 Next

Outgoing Citations (Sorted by Pagerank)

Showing 13 of 13 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers