Database Paper Browser

Back to papers

Shark: SQL and Rich Analytics at Scale

Summary: Shark unifies SQL and analytics on clusters via a distributed memory abstraction into a single scalable engine. In-memory columnar storage, replanning, and fault tolerance enable SQL and ML, 100x faster than Hive/Hadoop, competitive with MPP. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
4697
Venue
SIGMOD
Year
2013
Pagerank
0.00020595648
Overall Rank
542 | 96.24%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 50 of 54 citing papers.

Rank Citing Paper Year Venue Pagerank
66 Spark SQL: Relational Data Processing in Spark 2015 SIGMOD 0.00061639801
1,435 Simba: Efficient In-Memory Spatial Analytics 2016 SIGMOD 0.00012004456
1,477 Fine-grained Partitioning for Aggressive Data Skipping 2014 SIGMOD 0.00011770865
1,487 Scuba: Diving into Data at Facebook 2013 VLDB 0.00011701099
1,814 Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing 2014 VLDB 0.00010458107
1,874 Knowing When You’re Wrong: Building Fast and Reliable Approximate Query Processing Systems 2014 SIGMOD 0.00010244443
1,939 From Theory to Practice: Efficient Join Query Evaluation in a Parallel Database System 2015 SIGMOD 0.00010025655
2,127 SQL-on-Hadoop: Full Circle Back to Shared-Nothing Database Architectures 2014 VLDB 9.4863172e-05
2,212 Skew in Parallel Query Processing 2014 PODS 9.2771827e-05
2,412 WideTable: An Accelerator for Analytical Data Processing 2014 VLDB 8.8726508e-05
2,772 Quickstep: A Data Platform Based on the Scaling-Up Approach 2018 VLDB 8.1401661e-05
2,844 Towards Scalable Real-time Analytics: An Architecture for Scale-out of OLxP Workloads 2015 VLDB 8.0243849e-05
2,928 WANalytics: Analytics for a Geo-Distributed Data-Intensive World 2015 CIDR 7.8812874e-05
2,946 BigDansing: A System for Big Data Cleansing 2015 SIGMOD 7.8372441e-05
3,066 HAWQ: A Massively Parallel Processing SQL Engine in Hadoop 2014 SIGMOD 7.6221974e-05
3,608 Column Sketches: A Scan Accelerator for Rapid and Robust Predicate Evaluation 2018 SIGMOD 6.924272e-05
3,821 Locality-aware Partitioning in Parallel Database Systems 2015 SIGMOD 6.7281515e-05
4,033 In-RDBMS Hardware Acceleration of Advanced Analytics 2018 VLDB 6.5113267e-05
4,046 WANalytics: Geo-Distributed Analytics for a Data Intensive World 2015 SIGMOD 6.4979392e-05
4,161 Access Path Selection in Main-Memory Optimized Data Systems: Should I Scan or Should I Probe? 2017 SIGMOD 6.3938006e-05
5,014 Dynamically Optimizing Queries over Large Scale Data Platforms 2014 SIGMOD 5.7586174e-05
5,119 Design Tradeoffs of Data Access Methods 2016 SIGMOD 5.6807904e-05
5,368 Fine-Grained Modeling and Optimization for Intelligent Resource Management in Big Data Processing 2022 VLDB 5.5457532e-05
5,829 A Performance Study of Big Data on Small Nodes 2015 VLDB 5.3113542e-05
6,075 Opportunistic Physical Design for Big Data Analytics 2014 SIGMOD 5.223901e-05
6,304 Elastic Pipelining in an In-Memory Database Cluster 2016 SIGMOD 5.1210182e-05
6,784 SparkR: Scaling R Programs with Spark 2016 SIGMOD 4.9265155e-05
6,802 Understanding Insights into the Basic Structure and Essential Issues of Table Placement Methods in Clusters 2013 VLDB 4.9226626e-05
6,809 Adaptive Data Skipping in Main-Memory Systems 2016 SIGMOD 4.9206606e-05
6,856 Liquid: Unifying Nearline and Offline Big Data Integration 2015 CIDR 4.9060615e-05
6,871 Towards General and Efficient Online Tuning for Spark 2023 VLDB 4.8997004e-05
6,895 Decentralized Actor Scheduling and Reference-based Storage in Xorbits: a Native Scalable Data Science Engine 2025 VLDB 4.8925595e-05
7,059 Adaptive and Robust Query Execution for Lakehouses at Scale 2024 VLDB 4.8477825e-05
7,067 JetScope: Reliable and Interactive Analytics at Cloud Scale 2015 VLDB 4.8440936e-05
7,207 Kodiak: Leveraging Materialized Views For Very Low-Latency Analytics Over High-Dimensional Web-Scale Data 2016 VLDB 4.800763e-05
7,369 Using VDMS to Index and Search 100M Images 2021 VLDB 4.750437e-05
7,387 Bubble Execution: Resource-aware Reliable Analytics at Cloud Scale 2018 VLDB 4.7438193e-05
7,599 Quill: Efficient, Transferable, and Rich Analytics at Scale 2016 VLDB 4.7003593e-05
7,920 JoinBoost: Grow Trees Over Normalized Data Using Only SQL 2023 VLDB 4.6163888e-05
8,197 SparkCruise: Workload Optimization in Managed Spark Clusters at Microsoft 2021 VLDB 4.5607121e-05
8,215 Parallel-Correctness and Transferability for Conjunctive Queries 2015 PODS 4.5577562e-05
8,464 Piranha: Optimizing Short Jobs in Hadoop 2013 VLDB 4.5052127e-05
8,617 A Spark Optimizer for Adaptive, Fine-Grained Parameter Tuning 2024 VLDB 4.4846425e-05
8,924 QMapper for Smart Grid: Migrating SQL-based Application to Hive 2015 SIGMOD 4.427232e-05
9,448 Cost-based Fault-tolerance for Parallel Data Processing 2015 SIGMOD 4.3401906e-05
9,584 Introduction to Spark 2.0 for Database Researchers 2016 SIGMOD 4.3218691e-05
10,404 Dynamic Pruning for Recursive Joins 2025 SIGMOD 4.1945683e-05
11,690 Integration of Large-Scale Data Processing Systems and Traditional Parallel Database Technology 2019 VLDB 4.1945683e-05
11,831 Logical Aspects of Massively Parallel and Distributed Systems 2016 PODS 4.1945683e-05
11,882 Parallel Evaluation of Multi-Semi-Joins 2016 VLDB 4.1945683e-05
Previous Page 1 / 2 Next

Outgoing Citations (Sorted by Pagerank)

Showing 18 of 18 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
4 Pregel: A System for Large-Scale Graph Processing 2010 SIGMOD 0.0019005923
21 C-Store: A Column-oriented DBMS 2005 VLDB 0.00086087497
22 SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets 2008 VLDB 0.0008456613
37 Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud 2012 VLDB 0.0007522744
42 A Comparison of Approaches to Large-Scale Data Analysis 2009 SIGMOD 0.00073498298
109 Dremel: Interactive Analysis of Web-Scale Datasets 2010 VLDB 0.00048186983
115 Eddies: Continuously Adaptive Query Processing 2000 SIGMOD 0.00046221215
157 HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads 2009 VLDB 0.00040397359
168 MAD Skills: New Analysis Practices for Big Data 2009 VLDB 0.00038946305
220 Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans 1998 SIGMOD 0.00033194808
413 HaLoop: Efficient Iterative Data Processing on Large Clusters 2010 VLDB 0.00023904409
456 Cost-based Query Scrambling for Initial Delays 1998 SIGMOD 0.00022717134
658 Towards a Unified Architecture for in-RDBMS Analytics 2012 SIGMOD 0.00018506577
913 Tenzing A SQL Implementation On The MapReduce Framework 2011 VLDB 0.00015408131
1,334 SkewTune: Mitigating Skew in MapReduce Applications 2012 SIGMOD 0.0001250413
1,470 Processing a Trillion Cells per Mouse Click 2012 VLDB 0.00011833779
1,721 Distributed Data-Parallel Computing Using a High-Level Programming Language 2009 SIGMOD 0.00010762918
1,863 Cheetah: A High Performance, Custom Data Warehouse on Top of MapReduce 2010 VLDB 0.00010286531
Previous Page 1 / 1 Next

Semantically Similar Papers