Back to papers
Big Data Analytics with Datalog Queries on Spark
Summary: BigDatalog enables concise declarative Datalog queries for large-scale analytics on Spark. It uses compilation and optimization to efficiently support recursion on Spark, with empirical comparisons against top Datalog systems showing Spark-based analytics viable.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 5256
- Venue
- SIGMOD
- Year
- 2016
- Pagerank
- 7.3912411e-05
- Overall Rank
- 3,200 | 77.74%
- DOI
-
10.1145/2882903.2915229
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 23 of 23 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 2,919 |
RaSQL: Greater Power and Performance for Big Data Analytics with Recursive-aggregate-SQL on Spark |
2019 |
SIGMOD |
7.9047279e-05 |
| 3,988 |
All-in-One: Graph Processing in RDBMSs Revisited |
2017 |
SIGMOD |
6.5589605e-05 |
| 4,701 |
Tensors: An abstraction for general data processing |
2021 |
VLDB |
5.9866564e-05 |
| 4,920 |
Shared Arrangements: practical inter-query sharing for streaming dataflows |
2020 |
VLDB |
5.8241888e-05 |
| 5,259 |
On the Optimization of Recursive Relational Queries: Application to Graph Queries |
2020 |
SIGMOD |
5.5984356e-05 |
| 5,705 |
Datalog Unchained |
2021 |
PODS |
5.3621239e-05 |
| 6,216 |
Automating Incremental and Asynchronous Evaluation for Recursive Aggregate Data Processing |
2020 |
SIGMOD |
5.1534945e-05 |
| 6,276 |
Scaling-Up In-Memory Datalog Processing: Observations and Techniques |
2019 |
VLDB |
5.1314426e-05 |
| 6,612 |
Complete Event Trend Detection in High-Rate Event Streams |
2017 |
SIGMOD |
4.9948556e-05 |
| 7,342 |
Optimizing Recursive Queries with Program Synthesis |
2022 |
SIGMOD |
4.7576316e-05 |
| 8,396 |
Optimizing Declarative Graph Queries at Large Scale |
2019 |
SIGMOD |
4.5276541e-05 |
| 8,883 |
Optimizing Parallel Recursive Datalog Evaluation on Multicore Machines |
2022 |
SIGMOD |
4.4285471e-05 |
| 8,994 |
Automatic Index Selection for Large-Scale Datalog Computation |
2019 |
VLDB |
4.4129398e-05 |
| 9,330 |
Parallel Query Processing: To Separate Communication from Computation |
2022 |
SIGMOD |
4.3556432e-05 |
| 9,813 |
Datalog with First-Class Facts |
2025 |
VLDB |
4.2783272e-05 |
| 9,814 |
Optimizing Nested Recursive Queries |
2024 |
SIGMOD |
4.2783272e-05 |
| 10,284 |
FlowLog: Efficient and Extensible Datalog via Incrementality |
2026 |
VLDB |
4.1945683e-05 |
| 10,404 |
Dynamic Pruning for Recursive Joins |
2025 |
SIGMOD |
4.1945683e-05 |
| 11,053 |
Efficient Enumeration of Recursive Plans in Transformation-based Query Optimizers |
2024 |
VLDB |
4.1945683e-05 |
| 11,130 |
The Vadalog Parallel System: Distributed Reasoning with Datalog+/- |
2024 |
VLDB |
4.1945683e-05 |
| 11,154 |
Templating Shuffles |
2023 |
CIDR |
4.1945683e-05 |
| 11,341 |
Juggler: Autonomous Cost Optimization and Performance Prediction of Big Data Applications |
2022 |
SIGMOD |
4.1945683e-05 |
| 11,647 |
Ariadne: Online Provenance for Big Graph Analytics |
2019 |
SIGMOD |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 21 of 21 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 4 |
Pregel: A System for Large-Scale Graph Processing |
2010 |
SIGMOD |
0.0019005923 |
| 37 |
Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud |
2012 |
VLDB |
0.0007522744 |
| 63 |
A Message Passing Framework for Logical Query Evaluation |
1986 |
SIGMOD |
0.00063714145 |
| 66 |
Spark SQL: Relational Data Processing in Spark |
2015 |
SIGMOD |
0.00061639801 |
| 70 |
Hive - A Warehousing Solution Over a Map-Reduce Framework |
2009 |
VLDB |
0.00059533166 |
| 778 |
Declarative Networking: Language, Execution and Optimization |
2006 |
SIGMOD |
0.00016791276 |
| 780 |
Building a High-Level Dataflow System on top of Map-Reduce: The Pig Experience |
2009 |
VLDB |
0.00016775082 |
| 1,294 |
Distributed SociaLite: A Datalog-Based Language for Large-Scale Graph Analysis |
2013 |
VLDB |
0.00012779484 |
| 1,374 |
Relational Transducers for Declarative Networking |
2011 |
PODS |
0.0001230835 |
| 2,079 |
A Framework for the Parallel Processing of Datalog Queries |
1990 |
SIGMOD |
9.5979932e-05 |
| 2,172 |
Spinning Fast Iterative Data Flows |
2012 |
VLDB |
9.3706587e-05 |
| 2,221 |
A New Paradigm For Parallel And Distributed Rule-Processing |
1990 |
SIGMOD |
9.2614541e-05 |
| 2,458 |
REX: Recursive, Delta-Based Data-Centric Computation |
2012 |
VLDB |
8.7683462e-05 |
| 3,069 |
Evita Raced: Metacompilation for Declarative Networks |
2008 |
VLDB |
7.6151182e-05 |
| 4,222 |
Why A Single Parallelization Strategy Is Not Enough In Knowledge Bases |
1989 |
PODS |
6.3480169e-05 |
| 4,223 |
Monotonic Aggregation in Deductive Databases |
1992 |
PODS |
6.3474752e-05 |
| 4,370 |
Distributed Processing Of Logic Programs |
1988 |
SIGMOD |
6.2486359e-05 |
| 4,696 |
Asynchronous and Fault-Tolerant Recursive Datalog Evaluation in Shared-Nothing Engines |
2015 |
VLDB |
5.9911301e-05 |
| 5,003 |
Graph Queries in a Next-Generation Datalog System |
2013 |
VLDB |
5.7652482e-05 |
| 5,925 |
Parallelizing Datalog Programs by Generalized Pivoting |
1991 |
PODS |
5.2717743e-05 |
| 9,088 |
Collaborative Access Control in WebdamLog |
2015 |
SIGMOD |
4.3992936e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 1,482 |
Automating Large-Scale Data Quality Verification |
2018 |
VLDB |
0.00011725533 |
| 11,197 |
QaaD (Query-as-a-Data): Scalable Execution of Massive Number of Small Queries in Spark |
2023 |
SIGMOD |
4.1945683e-05 |
| 5,106 |
Debugging Big Data Analytics in Spark with BigDebug |
2017 |
SIGMOD |
5.6927181e-05 |
| 8,617 |
A Spark Optimizer for Adaptive, Fine-Grained Parameter Tuning |
2024 |
VLDB |
4.4846425e-05 |
| 8,883 |
Optimizing Parallel Recursive Datalog Evaluation on Multicore Machines |
2022 |
SIGMOD |
4.4285471e-05 |
| 2,919 |
RaSQL: Greater Power and Performance for Big Data Analytics with Recursive-aggregate-SQL on Spark |
2019 |
SIGMOD |
7.9047279e-05 |
| 9,124 |
Dynamic Speculative Optimizations for SQL Compilation in Apache Spark |
2020 |
VLDB |
4.391961e-05 |
| 7,794 |
Large-scale Complex Analytics on Semi-structured Datasets using AsterixDB and Spark |
2016 |
VLDB |
4.6482977e-05 |
| 11,576 |
RASQL: A Powerful Language and its System for Big Data Applications |
2020 |
SIGMOD |
4.1945683e-05 |
| 6,276 |
Scaling-Up In-Memory Datalog Processing: Observations and Techniques |
2019 |
VLDB |
5.1314426e-05 |