Database Paper Browser

Back to papers

CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop

Summary: CoHadoop extends Hadoop with data placement hints to colocate related files, preserving fault tolerance and flexibility. It speeds joins, grouping, aggregations, and sessionization in log analytics, beating repartition-based and map-only schemes. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
10285
Venue
VLDB
Year
2011
Pagerank
8.8190594e-05
Overall Rank
2,439 | 83.04%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 17 of 17 citing papers.

Rank Citing Paper Year Venue Pagerank
2,674 Minimal MapReduce Algorithms 2013 SIGMOD 8.3328645e-05
3,247 Can the Elephants Handle the NoSQL Onslaught? 2012 VLDB 7.3260831e-05
3,922 Pushing Data-Induced Predicates Through Joins in Big-Data Clusters 2020 VLDB 6.6291079e-05
5,105 Only Aggressive Elephants are Fast Elephants 2012 VLDB 5.694494e-05
5,118 AdaptDB: Adaptive Partitioning for Distributed Joins 2017 VLDB 5.6820984e-05
5,790 AQWA: Adaptive Query-Workload-Aware Partitioning of Big Spatial Data 2015 VLDB 5.3269734e-05
6,173 Exploiting Soft and Hard Correlations in Big Data Query Optimization 2016 VLDB 5.1699414e-05
7,060 SquirrelJoin: Network-Aware Distributed Join Processing with Lazy Partitioning 2017 VLDB 4.8465382e-05
7,476 Lachesis: Automatic Partitioning for UDF-Centric Analytics 2021 VLDB 4.7188928e-05
7,902 Building Highly-Optimized, Low-Latency Pipelines for Genomic Data Analysis 2015 CIDR 4.6215911e-05
7,907 Petabyte-Scale Row-Level Operations in Data Lakehouses 2024 VLDB 4.6205839e-05
8,002 Pangea: Monolithic Distributed Storage for Data Analytics 2019 VLDB 4.6088289e-05
9,114 Data Stream Warehousing in Tidalrace 2015 CIDR 4.3935469e-05
9,466 Serenade - Low-Latency Session-Based Recommendation in e-Commerce at Scale 2022 SIGMOD 4.3349007e-05
11,835 An Efficient MapReduce Cube Algorithm for Varied Data Distributions 2016 SIGMOD 4.1945683e-05
11,894 Building Highly-Optimized, Low-Latency Pipelines for Genomic Data Analysis 2015 CIDR 4.1945683e-05
12,062 Next Generation Data Analytics at IBM Research 2013 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 10 of 10 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers