Spark SQL: Relational Data Processing in Spark
Summary: Relational processing integrated into Spark via DataFrame API, unifying SQL queries with Spark's functional workflow. Catalyst, a Scala-based extensible optimizer, enables composable rules, code generation, JSON schema inference, and federation to databases. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Michael Armbrust
- 2. Reynold S. Xin
- 3. Cheng Lian
- 4. Yin Huai
- 5. Davies Liu
- 6. Joseph K. Bradley
- 7. Xiangrui Meng
- 8. Tomer Kaftan
- 9. Michael J. Franklin
- 10. Ali Ghodsi
- 11. Matei Zaharia
Incoming Citations (Sorted by Pagerank)
Showing 6 of 206 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 11,694 | An Experimental Evaluation of Garbage Collectors on Big Data Applications | 2019 | VLDB | 4.1945683e-05 |
| 11,749 | An Authorization Model for Multi-Provider Queries | 2018 | VLDB | 4.1945683e-05 |
| 11,753 | Effective Temporal Dependence Discovery in Time Series Data | 2018 | VLDB | 4.1945683e-05 |
| 11,774 | Query Processing Techniques for Big Spatial-Keyword Data | 2017 | SIGMOD | 4.1945683e-05 |
| 11,948 | Tutorial: SQL-on-Hadoop Systems | 2015 | VLDB | 4.1945683e-05 |
| 13,096 | Blink Twice - Automatic Workload Pinning and Regression Detection for Versionless Apache Spark using Retries | 2025 | SIGMOD | - |
Outgoing Citations (Sorted by Pagerank)
Showing 15 of 15 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3,535 | Scaling Spark in the Real World: Performance and Usability | 2015 | VLDB | 6.9992495e-05 |
| 542 | Shark: SQL and Rich Analytics at Scale | 2013 | SIGMOD | 0.00020595648 |
| 557 | SystemML: Declarative Machine Learning on Spark | 2016 | VLDB | 0.00020197988 |
| 3,200 | Big Data Analytics with Datalog Queries on Spark | 2016 | SIGMOD | 7.3912411e-05 |
| 11,576 | RASQL: A Powerful Language and its System for Big Data Applications | 2020 | SIGMOD | 4.1945683e-05 |
| 1,548 | Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark | 2018 | SIGMOD | 0.00011431383 |
| 6,784 | SparkR: Scaling R Programs with Spark | 2016 | SIGMOD | 4.9265155e-05 |
| 9,584 | Introduction to Spark 2.0 for Database Researchers | 2016 | SIGMOD | 4.3218691e-05 |
| 9,124 | Dynamic Speculative Optimizations for SQL Compilation in Apache Spark | 2020 | VLDB | 4.391961e-05 |
| 2,919 | RaSQL: Greater Power and Performance for Big Data Analytics with Recursive-aggregate-SQL on Spark | 2019 | SIGMOD | 7.9047279e-05 |