Database Paper Browser

Back to papers

Hive - A Warehousing Solution Over a Map-Reduce Framework

Summary: Hive provides a data warehousing layer on Hadoop with SQL-like HiveQL compiled to MapReduce for scalable analytics on commodity hardware. It adds an extensible IO layer, a nested type system, and a centralized Hive-Metastore catalog for statistics and optimization, enabling large-scale deployments (thousands of tables, TB-scale data). (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
9962
Venue
VLDB
Year
2009
Pagerank
0.00059533166
Overall Rank
70 | 99.52%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 50 of 133 citing papers.

Rank Citing Paper Year Venue Pagerank
3,279 Early Accurate Results for Advanced Analytics on MapReduce 2012 VLDB 7.2855494e-05
3,355 F1 Query: Declarative Querying at Scale 2018 VLDB 7.1829142e-05
3,504 M3R: Increased Performance for In-Memory Hadoop Jobs 2012 VLDB 7.0347515e-05
3,517 Integrating Hadoop and Parallel DBMS 2010 SIGMOD 7.0199423e-05
3,548 Adaptive Query Processing on RAW Data 2014 VLDB 6.9859242e-05
3,703 Multi-Query Optimization in MapReduce Framework 2014 VLDB 6.8289978e-05
3,753 Choosing A Cloud DBMS: Architectures and Tradeoffs 2019 VLDB 6.7871241e-05
3,834 GTS: A Fast and Scalable Graph Processing Method based on Streaming Topology to GPUs 2016 SIGMOD 6.7173094e-05
3,891 Slalom: Coasting Through Raw Data via Adaptive Partitioning and Indexing 2017 VLDB 6.659442e-05
3,922 Pushing Data-Induced Predicates Through Joins in Big-Data Clusters 2020 VLDB 6.6291079e-05
3,947 Unicorn: A System for Searching the Social Graph 2013 VLDB 6.5967528e-05
4,174 Computation Reuse in Analytics Job Service at Microsoft 2018 SIGMOD 6.3856219e-05
4,188 Apache Tez: A Unifying Framework for Modeling and Building Data Processing Applications 2015 SIGMOD 6.3753681e-05
4,201 Meet Charles, big data query advisor 2013 CIDR 6.3639451e-05
4,326 Fast Queries Over Heterogeneous Data Through Engine Customization 2016 VLDB 6.288323e-05
4,514 An Empirical Evaluation of Columnar Storage Formats 2024 VLDB 6.1204636e-05
4,689 Algorithmic Aspects of Parallel Query Processing 2018 SIGMOD 5.9980099e-05
4,704 JSON Tiles: Fast Analytics on Semi-Structured Data 2021 SIGMOD 5.9853687e-05
4,767 Pinot: Realtime OLAP for 530 Million Users 2018 SIGMOD 5.9364731e-05
4,861 OctopusFS: A Distributed File System with Tiered Storage Management 2017 SIGMOD 5.8708916e-05
5,014 Dynamically Optimizing Queries over Large Scale Data Platforms 2014 SIGMOD 5.7586174e-05
5,185 Yugong: Geo-Distributed Data and Job Placement at Scale 2019 VLDB 5.6405374e-05
5,286 StreamOps: Cloud-Native Runtime Management for Streaming Services in ByteDance 2023 VLDB 5.5838392e-05
5,301 ReCache: Reactive Caching for Fast Analytics over Heterogeneous Data 2018 VLDB 5.5790928e-05
5,368 Fine-Grained Modeling and Optimization for Intelligent Resource Management in Big Data Processing 2022 VLDB 5.5457532e-05
5,535 Lightweight Cardinality Estimation in LSM-based Systems 2018 SIGMOD 5.4539235e-05
5,790 AQWA: Adaptive Query-Workload-Aware Partitioning of Big Spatial Data 2015 VLDB 5.3269734e-05
5,833 LOCAT: Low-Overhead Online Configuration Auto-Tuning of Spark SQL Applications 2022 SIGMOD 5.3106182e-05
5,844 MIFO: A Query-Semantic Aware Resource Allocation Policy 2019 SIGMOD 5.3030037e-05
5,903 Building Wavelet Histograms on Large Data in MapReduce 2012 VLDB 5.2791351e-05
5,980 The Era of Big Spatial Data 2017 VLDB 5.2449608e-05
6,117 REEF: Retainable Evaluator Execution Framework 2015 SIGMOD 5.2036631e-05
6,298 Hillview: A trillion-cell spreadsheet for big data 2019 VLDB 5.1226987e-05
6,304 Elastic Pipelining in an In-Memory Database Cluster 2016 SIGMOD 5.1210182e-05
6,402 BigLake: BigQuery’s Evolution toward a Multi-Cloud Lakehouse 2024 SIGMOD 5.079818e-05
6,407 Just-In-Time Data Virtualization: Lightweight Data Management with ViDa 2015 CIDR 5.076547e-05
6,483 Towards Unified Ad-hoc Data Processing 2014 SIGMOD 5.0456397e-05
6,498 Memory-Aware Framework for Efficient Second-Order Random Walk on Large Graphs 2020 SIGMOD 5.0392468e-05
6,590 Interactive Demonstration of Probabilistic Predicates 2018 SIGMOD 5.0010949e-05
6,673 Incorporating Super-Operators in Big-Data Query Optimizers 2020 VLDB 4.966799e-05
6,674 Exploiting Common Patterns for Tree-Structured Data 2017 SIGMOD 4.9663344e-05
6,757 KEA: Tuning an Exabyte-Scale Data Infrastructure 2021 SIGMOD 4.9372134e-05
6,856 Liquid: Unifying Nearline and Offline Big Data Integration 2015 CIDR 4.9060615e-05
7,067 JetScope: Reliable and Interactive Analytics at Cloud Scale 2015 VLDB 4.8440936e-05
7,198 BSMA: A Benchmark for Analytical Queries over Social Media Data 2014 VLDB 4.8033496e-05
7,207 Kodiak: Leveraging Materialized Views For Very Low-Latency Analytics Over High-Dimensional Web-Scale Data 2016 VLDB 4.800763e-05
7,270 Oracle In-Database Hadoop: When MapReduce Meets RDBMS 2012 SIGMOD 4.7813984e-05
7,399 SmartBench: A Benchmark For Data Management In Smart Spaces 2020 VLDB 4.7410149e-05
7,469 Bullion: A Column Store for Machine Learning 2025 CIDR 4.7204398e-05
7,476 Lachesis: Automatic Partitioning for UDF-Centric Analytics 2021 VLDB 4.7188928e-05
Previous Page 2 / 3 Next

Outgoing Citations (Sorted by Pagerank)

Showing 2 of 2 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
22 SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets 2008 VLDB 0.0008456613
42 A Comparison of Approaches to Large-Scale Data Analysis 2009 SIGMOD 0.00073498298
Previous Page 1 / 1 Next

Semantically Similar Papers