Database Paper Browser

Back to papers

Spark SQL: Relational Data Processing in Spark

Summary: Relational processing integrated into Spark via DataFrame API, unifying SQL queries with Spark's functional workflow. Catalyst, a Scala-based extensible optimizer, enables composable rules, code generation, JSON schema inference, and federation to databases. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
5022
Venue
SIGMOD
Year
2015
Pagerank
0.00061639801
Overall Rank
66 | 99.55%
DOI
10.1145/2723372.2742797

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 50 of 206 citing papers.

Rank Citing Paper Year Venue Pagerank
9,268 Language-Agnostic Integrated Queries in a Managed Polyglot Runtime 2021 VLDB 4.3657168e-05
9,289 In-Browser Interactive SQL Analytics with Afterburner 2017 SIGMOD 4.362197e-05
9,327 SODA: A Set of Fast Oblivious Algorithms in Distributed Secure Data Analytics 2023 VLDB 4.3556432e-05
9,330 Parallel Query Processing: To Separate Communication from Computation 2022 SIGMOD 4.3556432e-05
9,332 PlinyCompute: A Platform for High-Performance, Distributed, Data-Intensive Tool Development 2018 SIGMOD 4.3556432e-05
9,379 GIO: Generating Efficient Matrix and Frame Readers for Custom Data Formats by Example 2023 SIGMOD 4.3462787e-05
9,414 TreeToaster: Towards an IVM-Optimized Compiler 2021 SIGMOD 4.3441378e-05
9,434 Rock: Cleaning Data by Embedding ML in Logic Rules 2024 SIGMOD 4.3430376e-05
9,516 [Demo] Low-latency Spark Queries on Updatable Data 2019 SIGMOD 4.3335877e-05
9,584 Introduction to Spark 2.0 for Database Researchers 2016 SIGMOD 4.3218691e-05
9,607 Polyglot Data Management: State of the Art & Open Challenges 2022 VLDB 4.3177432e-05
9,702 Evaluating Query Languages and Systems for High-Energy Physics Data 2022 VLDB 4.3008468e-05
9,719 Tuplex: Robust, Efficient Analytics When Python Rules 2019 VLDB 4.2980763e-05
9,848 Saving Money for Analytical Workloads in the Cloud 2024 VLDB 4.2721228e-05
9,913 Chukonu: A Fully-Featured High-Performance Big Data Framework that Integrates a Native Compute Engine into Spark 2022 VLDB 4.2565279e-05
10,144 Beyond Relational: Semantic-Aware Multi-Modal Analytics with LLM-Native Query Optimization 2026 SIGMOD 4.1945683e-05
10,257 SIDLE: Tree-structure Aware Indexes for CXL-based Heterogeneous Memory 2026 VLDB 4.1945683e-05
10,385 Optimizing Block Skipping for High-Dimensional Data with Learned Adaptive Curve 2025 SIGMOD 4.1945683e-05
10,408 Managed Resource Scaling in Amazon EMR 2025 SIGMOD 4.1945683e-05
10,411 OpenMLDB: A Real-Time Relational Data Feature Computation System for Online ML 2025 SIGMOD 4.1945683e-05
10,491 Intra-Query Runtime Elasticity for Cloud-Native Data Analysis 2025 SIGMOD 4.1945683e-05
10,551 Avoiding Materialisation for Guarded Aggregate Queries 2025 VLDB 4.1945683e-05
10,568 QOVIS: Understanding and Diagnosing Query Optimizer via a Visualization-assisted Approach 2025 VLDB 4.1945683e-05
10,591 Accio: Bolt-on Query Federation 2025 VLDB 4.1945683e-05
10,662 ArrayMorph: Optimizing Hyperslab Queries on the Cloud for Machine Learning Pipelines 2025 VLDB 4.1945683e-05
10,714 Towards Designing Future-Proof Data Processing Systems 2025 VLDB 4.1945683e-05
10,736 TreeCat: Standalone Catalog Engine for Large Data Systems 2025 VLDB 4.1945683e-05
10,770 cedar: Optimized and Unified Machine Learning Input Data Pipelines 2025 VLDB 4.1945683e-05
10,778 GRewriter: Practical Query Rewriting with Automatic Rule Set Expansion in GaussDB 2025 VLDB 4.1945683e-05
10,852 CloudGlide: Deconstructing the Landscape of Cloud-Based Analytics 2025 VLDB 4.1945683e-05
10,868 LEAP: A Low-cost Spark SQL Query Optimizer using Pairwise Comparison 2025 VLDB 4.1945683e-05
10,883 IcedTea: Efficient and Responsive Time-Travel Debugging in Dataflow Systems 2025 VLDB 4.1945683e-05
10,969 Query Compilation Without Regrets 2024 SIGMOD 4.1945683e-05
11,056 Agile-Ant: Self-managing Distributed Cache Management for Cost Optimization of Big Data Applications 2024 VLDB 4.1945683e-05
11,066 OLAP on Modern Chiplet-Based Processors 2024 VLDB 4.1945683e-05
11,082 Large-Scale Metric Computation in Online Controlled Experiment Platform 2024 VLDB 4.1945683e-05
11,146 Raising the Level of Abstraction for Time-State Analytics With the Timeline Framework 2023 CIDR 4.1945683e-05
11,154 Templating Shuffles 2023 CIDR 4.1945683e-05
11,189 dsJSON: A Distributed SQL JSON Processor 2023 SIGMOD 4.1945683e-05
11,221 TEE-based General-purpose Computational Backend for Secure Delegated Data Processing 2023 SIGMOD 4.1945683e-05
11,267 Anser: Adaptive Information Sharing Framework of AnalyticDB 2023 VLDB 4.1945683e-05
11,295 XDB in Action: Decentralized Cross-Database Query Processing for Black-Box DBMSes 2023 VLDB 4.1945683e-05
11,341 Juggler: Autonomous Cost Optimization and Performance Prediction of Big Data Applications 2022 SIGMOD 4.1945683e-05
11,479 Vertex-centric Parallel Computation of SQL Queries 2021 SIGMOD 4.1945683e-05
11,485 Real-time Data Infrastructure at Uber 2021 SIGMOD 4.1945683e-05
11,522 AnyOLAP: Analytical Processing of Arbitrary Data-Intensive Applications without ETL 2021 VLDB 4.1945683e-05
11,531 Fangorn: Adaptive Execution Framework for Heterogeneous Workloads on Shared Clusters 2021 VLDB 4.1945683e-05
11,576 RASQL: A Powerful Language and its System for Big Data Applications 2020 SIGMOD 4.1945683e-05
11,672 Block as a Value for SQL over NoSQL 2019 VLDB 4.1945683e-05
11,690 Integration of Large-Scale Data Processing Systems and Traditional Parallel Database Technology 2019 VLDB 4.1945683e-05
Previous Page 4 / 5 Next

Outgoing Citations (Sorted by Pagerank)

Showing 15 of 15 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers