Database Paper Browser

Back to papers

Impala: A Modern, Open-Source SQL Engine for Hadoop

Summary: Impala: an open-source MPP SQL engine for Hadoop providing low-latency, high-concurrency execution for BI/read-mostly analytic queries where batch frameworks (e.g., Hive) fall short. Paper presents architecture/components and empirical superiority vs other SQL-on-Hadoop systems. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID
271
Venue
CIDR
Year
2015
Pagerank
0.00022226941
Overall Rank
476 | 96.70%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 50 of 62 citing papers.

Rank Citing Paper Year Venue Pagerank
66 Spark SQL: Relational Data Processing in Spark 2015 SIGMOD 0.00061639801
1,409 High-Speed Query Processing over High-Speed Networks 2016 VLDB 0.00012132768
1,729 Cloud-Native Database Systems at Alibaba: Opportunities and Challenges 2019 VLDB 0.0001073728
1,792 Hybrid Transactional/Analytical Processing: A Survey 2017 SIGMOD 0.00010537893
1,864 Relaxed Operator Fusion for In-Memory Databases: Making Compilation, Vectorization, and Prefetching Work Together At Last 2018 VLDB 0.00010280966
2,154 DIFF: A Relational Interface for Large-Scale Data Explanation 2019 VLDB 9.4208667e-05
2,471 Morton Filters: Faster, Space-Efficient Cuckoo Filters via Biasing, Compression, and Decoupled Logical Sparsity 2018 VLDB 8.7320072e-05
2,501 DBEst: Revisiting Approximate Query Processing Engines with Machine Learning Models 2019 SIGMOD 8.6453446e-05
2,545 POLARIS: The Distributed SQL Engine in Azure Synapse 2020 VLDB 8.5725413e-05
2,838 How to Architect a Query Compiler, Revisited 2018 SIGMOD 8.0408472e-05
2,844 Towards Scalable Real-time Analytics: An Architecture for Scale-out of OLxP Workloads 2015 VLDB 8.0243849e-05
3,058 Rethinking Data-Intensive Science Using Scalable Analytics Systems 2015 SIGMOD 7.6410159e-05
3,152 AnalyticDB: Real-time OLAP Database System at Alibaba Cloud 2019 VLDB 7.4711766e-05
3,355 F1 Query: Declarative Querying at Scale 2018 VLDB 7.1829142e-05
3,608 Column Sketches: A Scan Accelerator for Rapid and Robust Predicate Evaluation 2018 SIGMOD 6.924272e-05
3,704 How to Win a Hot Dog Eating Contest: Distributed Incremental View Maintenance with Batch Updates 2016 SIGMOD 6.827494e-05
3,763 Flexible Rule-Based Decomposition and Metadata Independence in Modin: A Parallel Dataframe System 2022 VLDB 6.7801795e-05
3,891 Slalom: Coasting Through Raw Data via Adaptive Partitioning and Indexing 2017 VLDB 6.659442e-05
3,973 Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing 2019 SIGMOD 6.5758017e-05
3,982 The Myria Big Data Management and Analytics System and Cloud Service 2017 CIDR 6.5651188e-05
4,158 Performance-Optimal Filtering: Bloom Overtakes Cuckoo at High Throughput 2019 VLDB 6.3994318e-05
4,188 Apache Tez: A Unifying Framework for Modeling and Building Data Processing Applications 2015 SIGMOD 6.3753681e-05
4,262 Efficient Processing of Window Functions in Analytical SQL Queries 2015 VLDB 6.3117226e-05
4,368 Evolving Databases for New-Gen Big Data Applications 2017 CIDR 6.2491345e-05
4,390 LogStore: A Cloud-Native and Multi-Tenant Log Database 2021 SIGMOD 6.2279149e-05
4,688 Alibaba Hologres: A Cloud-Native Service for Hybrid Serving/Analytical Processing 2020 VLDB 5.9980609e-05
4,767 Pinot: Realtime OLAP for 530 Million Users 2018 SIGMOD 5.9364731e-05
5,441 Using Cloud Functions as Accelerator for Elastic Data Analytics 2023 SIGMOD 5.5028093e-05
5,535 Lightweight Cardinality Estimation in LSM-based Systems 2018 SIGMOD 5.4539235e-05
5,980 The Era of Big Spatial Data 2017 VLDB 5.2449608e-05
6,264 VectorH: Taking SQL-on-Hadoop to the Next Level 2016 SIGMOD 5.1348427e-05
6,298 Hillview: A trillion-cell spreadsheet for big data 2019 VLDB 5.1226987e-05
6,339 Incremental Computation of Common Windowed Holistic Aggregates 2016 VLDB 5.1051458e-05
6,367 Good to the Last Bit: Data-Driven Encoding with CodecDB 2021 SIGMOD 5.0941072e-05
6,404 ColumnML: Column-Store Machine Learning with On-The-Fly Data Transformation 2019 VLDB 5.0786954e-05
6,784 SparkR: Scaling R Programs with Spark 2016 SIGMOD 4.9265155e-05
6,809 Adaptive Data Skipping in Main-Memory Systems 2016 SIGMOD 4.9206606e-05
7,059 Adaptive and Robust Query Execution for Lakehouses at Scale 2024 VLDB 4.8477825e-05
7,067 JetScope: Reliable and Interactive Analytics at Cloud Scale 2015 VLDB 4.8440936e-05
7,207 Kodiak: Leveraging Materialized Views For Very Low-Latency Analytics Over High-Dimensional Web-Scale Data 2016 VLDB 4.800763e-05
7,335 MorphStore: Analytical Query Engine with a Holistic Compression-Enabled Processing Model 2020 VLDB 4.7603723e-05
7,387 Bubble Execution: Resource-aware Reliable Analytics at Cloud Scale 2018 VLDB 4.7438193e-05
7,469 Bullion: A Column Store for Machine Learning 2025 CIDR 4.7204398e-05
7,534 Enabling Efficient and General Subpopulation Analytics in Multidimensional Data Streams 2022 VLDB 4.7180004e-05
7,599 Quill: Efficient, Transferable, and Rich Analytics at Scale 2016 VLDB 4.7003593e-05
7,691 Bringing Compiling Databases to RISC Architectures 2023 VLDB 4.6762283e-05
7,818 A Survey and Experimental Comparison of Distributed SPARQL Engines for Very Large RDF Data 2017 VLDB 4.6434716e-05
7,866 Operational Analytics Data Management Systems 2016 VLDB 4.6321795e-05
8,231 FusionInsight LibrA: Huawei’s Enterprise Cloud Data Analytics Platform 2018 VLDB 4.5539609e-05
8,502 Conditional Cuckoo Filters 2021 SIGMOD 4.4972336e-05
Previous Page 1 / 2 Next

Outgoing Citations (Sorted by Pagerank)

Showing 6 of 6 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers