Database Paper Browser

Back to papers

Hive - A Warehousing Solution Over a Map-Reduce Framework

Summary: Hive provides a data warehousing layer on Hadoop with SQL-like HiveQL compiled to MapReduce for scalable analytics on commodity hardware. It adds an extensible IO layer, a nested type system, and a centralized Hive-Metastore catalog for statistics and optimization, enabling large-scale deployments (thousands of tables, TB-scale data). (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
9962
Venue
VLDB
Year
2009
Pagerank
0.00059533166
Overall Rank
70 | 99.52%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 50 of 133 citing papers.

Rank Citing Paper Year Venue Pagerank
210 Gorilla: A Fast, Scalable, In-Memory Time Series Database 2015 VLDB 0.0003404384
310 The Vertica Analytic Database: C-Store 7 Years Later 2012 VLDB 0.00028132402
329 Accelerating Machine Learning Inference with Probabilistic Predicates 2018 SIGMOD 0.00027249545
396 One Trillion Edges: Graph Processing at Facebook-Scale 2015 VLDB 0.00024424102
538 The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing 2015 VLDB 0.00020678804
544 Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources 2018 SIGMOD 0.00020521965
794 Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing) 2010 VLDB 0.00016605103
913 Tenzing A SQL Implementation On The MapReduce Framework 2011 VLDB 0.00015408131
947 MRShare: Sharing Across Multiple Queries in MapReduce 2010 VLDB 0.00015114576
979 Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads 2012 VLDB 0.0001488055
1,110 Parallel Evaluation of Conjunctive Queries 2011 PODS 0.00013968198
1,158 Simulation of Database-Valued Markov Chains Using SimSQL 2013 SIGMOD 0.0001361064
1,261 Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce 2013 VLDB 0.00012989236
1,265 Jaql: A Scripting Language for Large Scale Semistructured Data Analysis 2011 VLDB 0.00012947629
1,323 Quickr: Lazily Approximating Complex AdHoc Queries in BigData Clusters 2016 SIGMOD 0.00012601997
1,402 Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML 2014 VLDB 0.00012180605
1,411 Communication Steps for Parallel Query Processing 2013 PODS 0.0001212565
1,435 Simba: Efficient In-Memory Spatial Analytics 2016 SIGMOD 0.00012004456
1,440 Provenance for Generalized Map and Reduce Workflows 2011 CIDR 0.00011961469
1,477 Fine-grained Partitioning for Aggressive Data Skipping 2014 SIGMOD 0.00011770865
1,487 Scuba: Diving into Data at Facebook 2013 VLDB 0.00011701099
1,495 Ricardo: Integrating R and Hadoop 2010 SIGMOD 0.00011691049
1,574 Approximate Query Processing: No Silver Bullet 2017 SIGMOD 0.00011287495
1,615 The Performance of MapReduce: An In-depth Study 2010 VLDB 0.00011132319
1,715 V-SMART-Join: A Scalable MapReduce Framework for All-Pair Similarity Joins of Multisets and Vectors 2012 VLDB 0.00010803271
1,729 Cloud-Native Database Systems at Alibaba: Opportunities and Challenges 2019 VLDB 0.0001073728
1,800 epiC: an Extensible and Scalable System for Processing Big Data 2014 VLDB 0.00010512649
1,814 Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing 2014 VLDB 0.00010458107
1,863 Cheetah: A High Performance, Custom Data Warehouse on Top of MapReduce 2010 VLDB 0.00010286531
1,874 Knowing When You’re Wrong: Building Fast and Reliable Approximate Query Processing Systems 2014 SIGMOD 0.00010244443
1,922 Selecting Subexpressions to Materialize at Datacenter Scale 2018 VLDB 0.00010082599
2,027 Titian: Data Provenance Support in Spark 2016 VLDB 9.7437067e-05
2,083 Towards a Learning Optimizer for Shared Clouds 2019 VLDB 9.5834572e-05
2,205 ReStore: Reusing Results of MapReduce Jobs 2012 VLDB 9.2920002e-05
2,262 Manu: A Cloud Native Vector Database Management System 2022 VLDB 9.1624446e-05
2,322 Instant Loading for Main Memory Databases 2013 VLDB 9.034874e-05
2,338 Samza: Stateful Scalable Stream Processing at LinkedIn 2017 VLDB 9.00711e-05
2,439 CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop 2011 VLDB 8.8190594e-05
2,476 A Platform for Scalable One-Pass Analytics using MapReduce 2011 SIGMOD 8.6960139e-05
2,501 DBEst: Revisiting Approximate Query Processing Engines with Machine Learning Models 2019 SIGMOD 8.6453446e-05
2,611 Opening the Black Boxes in Data Flow Optimization 2012 VLDB 8.4536967e-05
2,667 Cumulon: Optimizing Statistical Data Analysis in the Cloud 2013 SIGMOD 8.3413995e-05
2,736 Online Aggregation and Continuous Query support in MapReduce 2010 SIGMOD 8.2043187e-05
2,818 Implicit Parallelism through Deep Language Embedding 2015 SIGMOD 8.0665558e-05
2,844 Towards Scalable Real-time Analytics: An Architecture for Scale-out of OLxP Workloads 2015 VLDB 8.0243849e-05
2,946 BigDansing: A System for Big Data Cleansing 2015 SIGMOD 7.8372441e-05
2,965 SQLShare: Results from a Multi-Year SQL-as-a-Service Experiment 2016 SIGMOD 7.8059273e-05
3,066 HAWQ: A Massively Parallel Processing SQL Engine in Hadoop 2014 SIGMOD 7.6221974e-05
3,152 AnalyticDB: Real-time OLAP Database System at Alibaba Cloud 2019 VLDB 7.4711766e-05
3,200 Big Data Analytics with Datalog Queries on Spark 2016 SIGMOD 7.3912411e-05
Previous Page 1 / 3 Next

Outgoing Citations (Sorted by Pagerank)

Showing 2 of 2 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
22 SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets 2008 VLDB 0.0008456613
42 A Comparison of Approaches to Large-Scale Data Analysis 2009 SIGMOD 0.00073498298
Previous Page 1 / 1 Next

Semantically Similar Papers