Database Paper Browser

Back to papers

Hive - A Warehousing Solution Over a Map-Reduce Framework

Summary: Hive provides a data warehousing layer on Hadoop with SQL-like HiveQL compiled to MapReduce for scalable analytics on commodity hardware. It adds an extensible IO layer, a nested type system, and a centralized Hive-Metastore catalog for statistics and optimization, enabling large-scale deployments (thousands of tables, TB-scale data). (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
9962
Venue
VLDB
Year
2009
Pagerank
0.00059533166
Overall Rank
70 | 99.52%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 33 of 133 citing papers.

Rank Citing Paper Year Venue Pagerank
7,534 Enabling Efficient and General Subpopulation Analytics in Multidimensional Data Streams 2022 VLDB 4.7180004e-05
7,599 Quill: Efficient, Transferable, and Rich Analytics at Scale 2016 VLDB 4.7003593e-05
7,778 Runtime Variation in Big Data Analytics 2023 SIGMOD 4.653651e-05
8,464 Piranha: Optimizing Short Jobs in Hadoop 2013 VLDB 4.5052127e-05
8,506 New Query Optimization Techniques in the Spark Engine of Azure Synapse 2022 VLDB 4.4957661e-05
8,617 A Spark Optimizer for Adaptive, Fine-Grained Parameter Tuning 2024 VLDB 4.4846425e-05
8,758 Hyperspace: The Indexing Subsystem of Azure Synapse 2021 VLDB 4.456315e-05
8,924 QMapper for Smart Grid: Migrating SQL-based Application to Hive 2015 SIGMOD 4.427232e-05
9,118 Towards Observability for Production Machine Learning Pipelines 2022 VLDB 4.3928288e-05
9,194 Phoebe: A Learning-based Checkpoint Optimizer 2021 VLDB 4.3761777e-05
9,201 F3: The Open-Source Data File Format for the Future 2026 SIGMOD 4.3743539e-05
9,347 Rank Join Queries in NoSQL Databases 2014 VLDB 4.3526718e-05
9,376 Versatile Optimization of UDF-heavy Data Flows with Sofa 2014 SIGMOD 4.347376e-05
9,384 Sapprox: Enabling Efficient and Accurate Approximations on Sub-datasets with Distribution-aware Online Sampling 2017 VLDB 4.3456129e-05
9,692 GHive: A Demonstration of GPU-Accelerated Query Processing in Apache Hive 2022 SIGMOD 4.302852e-05
9,894 OceanRT: Real-Time Analytics over Large Temporal Data 2014 SIGMOD 4.2602616e-05
10,196 PTO: A Workload-driven Predictive Table Optimizer for Lakehouse Systems 2026 SIGMOD 4.1945683e-05
10,411 OpenMLDB: A Real-Time Relational Data Feature Computation System for Online ML 2025 SIGMOD 4.1945683e-05
10,568 QOVIS: Understanding and Diagnosing Query Optimizer via a Visualization-assisted Approach 2025 VLDB 4.1945683e-05
10,636 Concurrency Control as a Service 2025 VLDB 4.1945683e-05
10,662 ArrayMorph: Optimizing Hyperslab Queries on the Cloud for Machine Learning Pipelines 2025 VLDB 4.1945683e-05
11,084 Presto’s History-based Query Optimizer 2024 VLDB 4.1945683e-05
11,389 CDI-E: An Elastic Cloud Service for Data Engineering 2022 VLDB 4.1945683e-05
11,690 Integration of Large-Scale Data Processing Systems and Traditional Parallel Database Technology 2019 VLDB 4.1945683e-05
11,831 Logical Aspects of Massively Parallel and Distributed Systems 2016 PODS 4.1945683e-05
11,859 dmapply: A functional primitive to express distributed machine learning algorithms in R 2016 VLDB 4.1945683e-05
11,890 Let's Rethink Join Optimization in Distributed Systems 2015 CIDR 4.1945683e-05
11,916 A Demonstration of Rubato DB: A Highly Scalable NewSQL Database System for OLTP and Big Data Applications 2015 SIGMOD 4.1945683e-05
11,976 Anti-Combining for MapReduce 2014 SIGMOD 4.1945683e-05
11,999 Getting Your Big Data Priorities Straight: A Demonstration of Priority-based QoS using Social-network-driven Stock Recommendation 2014 VLDB 4.1945683e-05
12,005 Design and Implementation of a Real-Time Interactive Analytics System for Large Spatio-Temporal Data 2014 VLDB 4.1945683e-05
12,109 Declarative Error Management for Robust Data-Intensive Applications 2012 SIGMOD 4.1945683e-05
12,203 Resiliency-Aware Data Management 2011 VLDB 4.1945683e-05
Previous Page 3 / 3 Next

Outgoing Citations (Sorted by Pagerank)

Showing 2 of 2 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
22 SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets 2008 VLDB 0.0008456613
42 A Comparison of Approaches to Large-Scale Data Analysis 2009 SIGMOD 0.00073498298
Previous Page 1 / 1 Next

Semantically Similar Papers