Database Paper Browser

Back to papers

Spark SQL: Relational Data Processing in Spark

Summary: Relational processing integrated into Spark via DataFrame API, unifying SQL queries with Spark's functional workflow. Catalyst, a Scala-based extensible optimizer, enables composable rules, code generation, JSON schema inference, and federation to databases. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
5022
Venue
SIGMOD
Year
2015
Pagerank
0.00061639801
Overall Rank
66 | 99.55%
DOI
10.1145/2723372.2742797

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 50 of 206 citing papers.

Rank Citing Paper Year Venue Pagerank
3,648 One WITH RECURSIVE is Worth Many GOTOs 2021 SIGMOD 6.8831123e-05
3,704 How to Win a Hot Dog Eating Contest: Distributed Incremental View Maintenance with Batch Updates 2016 SIGMOD 6.827494e-05
3,737 Skipping-oriented Partitioning for Columnar Layouts 2017 VLDB 6.8033227e-05
3,763 Flexible Rule-Based Decomposition and Metadata Independence in Modin: A Parallel Dataframe System 2022 VLDB 6.7801795e-05
3,768 F1 Lightning: HTAP as a Service 2020 VLDB 6.7782774e-05
3,918 On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML 2018 VLDB 6.6315176e-05
3,922 Pushing Data-Induced Predicates Through Joins in Big-Data Clusters 2020 VLDB 6.6291079e-05
3,944 AQP++: Connecting Approximate Query Processing With Aggregate Precomputation for Interactive Analytics 2018 SIGMOD 6.6078243e-05
3,967 Apache IoTDB: A Time Series Database for IoT Applications 2023 SIGMOD 6.5796647e-05
4,036 Adore: Differentially Oblivious Relational Database Operators 2023 VLDB 6.5089579e-05
4,326 Fast Queries Over Heterogeneous Data Through Engine Customization 2016 VLDB 6.288323e-05
4,368 Evolving Databases for New-Gen Big Data Applications 2017 CIDR 6.2491345e-05
4,409 Declarative Recursive Computation on an RDBMS 2019 VLDB 6.2104034e-05
4,419 Don't Hold My Data Hostage - A Case For Client Protocol Redesign 2017 VLDB 6.2022597e-05
4,602 Accelerating Raw Data Analysis with the ACCORDA Software and Hardware Architecture 2019 VLDB 6.0567387e-05
4,624 Wildfire: Concurrent Blazing Data Ingest and Analytics 2016 SIGMOD 6.0411906e-05
4,650 LocationSpark: A Distributed In-Memory Data Management System for Big Spatial Data 2016 VLDB 6.0234336e-05
4,658 ExplainIt! - A Declarative Root-cause Analysis Engine for Time Series Data 2019 SIGMOD 6.0183783e-05
4,667 FlexPushdownDB: Hybrid Pushdown and Caching in a Cloud DBMS 2021 VLDB 6.0116919e-05
4,677 Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications 2018 SIGMOD 6.0047822e-05
4,701 Tensors: An abstraction for general data processing 2021 VLDB 5.9866564e-05
4,704 JSON Tiles: Fast Analytics on Semi-Structured Data 2021 SIGMOD 5.9853687e-05
4,725 GeCo: Quality Counterfactual Explanations in Real Time 2021 VLDB 5.9697637e-05
4,773 PolyFrame: A Retargetable Query-based Approach to Scaling Dataframes 2021 VLDB 5.9320139e-05
4,804 Efficient Deep Learning Pipelines for Accurate Cost Estimations Over Large Scale Query Workload 2021 SIGMOD 5.910467e-05
4,916 UlTraMan: A Unified Platform for Big Trajectory Data Management and Analytics 2018 VLDB 5.8300787e-05
4,964 PS2: Parameter Server on Spark 2019 SIGMOD 5.7965988e-05
5,257 Probabilistic Demand Forecasting at Scale 2017 VLDB 5.6003925e-05
5,301 ReCache: Reactive Caching for Fast Analytics over Heterogeneous Data 2018 VLDB 5.5790928e-05
5,402 Towards Scalable Hybrid Stores: Constraint-Based Rewriting to the Rescue 2019 SIGMOD 5.5278023e-05
5,441 Using Cloud Functions as Accelerator for Elastic Data Analytics 2023 SIGMOD 5.5028093e-05
5,531 Presto: A Decade of SQL Analytics at Meta 2023 SIGMOD 5.4549499e-05
5,584 Efficient Confidentiality-Preserving Data Analytics over Symmetrically Encrypted Datasets 2020 VLDB 5.4232012e-05
5,640 AutoSteer: Learned Query Optimization for Any SQL Database 2023 VLDB 5.3933314e-05
5,718 Conjunctive Queries with Comparisons 2022 SIGMOD 5.3552123e-05
5,793 Lifetime-Based Memory Management for Distributed Data Processing Systems 2016 VLDB 5.3258796e-05
5,833 LOCAT: Low-Overhead Online Configuration Auto-Tuning of Spark SQL Applications 2022 SIGMOD 5.3106182e-05
5,844 MIFO: A Query-Semantic Aware Resource Allocation Policy 2019 SIGMOD 5.3030037e-05
5,851 GraphOS: Towards Oblivious Graph Processing 2023 VLDB 5.300937e-05
5,865 ByteHTAP: ByteDance’s HTAP System with High Data Freshness and Strong Data Consistency 2022 VLDB 5.296893e-05
5,876 Membrane - Safe and Performant Data Access Controls in Apache Spark in the Presence of Imperative Code 2024 VLDB 5.2922419e-05
5,888 Magnet: Push-based Shuffle Service for Large-scale Data Processing 2020 VLDB 5.2873617e-05
5,966 Cornus: Atomic Commit for a Cloud DBMS with Storage Disaggregation 2023 VLDB 5.2517881e-05
6,066 GPU Database Systems Characterization and Optimization 2024 VLDB 5.2290447e-05
6,229 When Tree Meets Hash: Reducing Random Reads for Index Structures on Persistent Memories 2023 SIGMOD 5.1463389e-05
6,242 Helios: Hyperscale Indexing for the Cloud & Edge 2020 VLDB 5.1408379e-05
6,264 VectorH: Taking SQL-on-Hadoop to the Next Level 2016 SIGMOD 5.1348427e-05
6,282 Cheetah: Accelerating Database Queries with Switch Pruning 2020 SIGMOD 5.128797e-05
6,327 The Tensor Data Platform: Towards an AI-centric Database System 2023 CIDR 5.1083405e-05
6,378 Mitigating the Impedance Mismatch between Prediction Query Execution and Database Engine 2025 SIGMOD 5.0909804e-05
Previous Page 2 / 5 Next

Outgoing Citations (Sorted by Pagerank)

Showing 15 of 15 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers