Database Paper Browser

Back to papers

Building a High-Level Dataflow System on top of Map-Reduce: The Pig Experience

Summary: Pig provides SQL-like data manipulation on MapReduce by building explicit dataflows interleaved with UDFs, compiled to Hadoop jobs. It discusses challenges and compares Pig's performance to hand-tuned MapReduce, showing productivity gains with modest overhead. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
9833
Venue
VLDB
Year
2009
Pagerank
0.00016775082
Overall Rank
780 | 94.58%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 33 of 33 citing papers.

Rank Citing Paper Year Venue Pagerank
447 Efficient Parallel Set-Similarity Joins Using MapReduce 2010 SIGMOD 0.00022900171
538 The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing 2015 VLDB 0.00020678804
794 Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing) 2010 VLDB 0.00016605103
947 MRShare: Sharing Across Multiple Queries in MapReduce 2010 VLDB 0.00015114576
979 Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads 2012 VLDB 0.0001488055
1,110 Parallel Evaluation of Conjunctive Queries 2011 PODS 0.00013968198
1,261 Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce 2013 VLDB 0.00012989236
1,334 SkewTune: Mitigating Skew in MapReduce Applications 2012 SIGMOD 0.0001250413
1,863 Cheetah: A High Performance, Custom Data Warehouse on Top of MapReduce 2010 VLDB 0.00010286531
2,205 ReStore: Reusing Results of MapReduce Jobs 2012 VLDB 9.2920002e-05
2,747 Stubby: A Transformation-based Optimizer for MapReduce Workflows 2012 VLDB 8.1828918e-05
3,200 Big Data Analytics with Datalog Queries on Spark 2016 SIGMOD 7.3912411e-05
3,517 Integrating Hadoop and Parallel DBMS 2010 SIGMOD 7.0199423e-05
3,601 Large-Scale Machine Learning at Twitter 2012 SIGMOD 6.9315087e-05
3,703 Multi-Query Optimization in MapReduce Framework 2014 VLDB 6.8289978e-05
4,132 Advanced Join Strategies for Large-Scale Distributed Computation 2014 VLDB 6.4241067e-05
4,425 Nova: Continuous Pig/Hadoop Workflows 2011 SIGMOD 6.198382e-05
4,572 The Unified Logging Infrastructure for Data Analytics at Twitter 2012 VLDB 6.0760183e-05
4,603 Inspector Gadget: A Framework for Custom Monitoring and Debugging of Distributed Dataflows 2011 VLDB 6.0554018e-05
5,014 Dynamically Optimizing Queries over Large Scale Data Platforms 2014 SIGMOD 5.7586174e-05
5,341 Inspector Gadget: A Framework for Custom Monitoring and Debugging of Distributed Dataflows 2011 SIGMOD 5.5607484e-05
5,368 Fine-Grained Modeling and Optimization for Intelligent Resource Management in Big Data Processing 2022 VLDB 5.5457532e-05
5,558 A Hadoop Based Distributed Loading Approach to Parallel Data Warehouses 2011 SIGMOD 5.4341353e-05
5,903 Building Wavelet Histograms on Large Data in MapReduce 2012 VLDB 5.2791351e-05
6,131 Fast Data in the Era of Big Data: Twitter's Real-Time Related Query Suggestion Architecture 2013 SIGMOD 5.1956688e-05
6,173 Exploiting Soft and Hard Correlations in Big Data Query Optimization 2016 VLDB 5.1699414e-05
7,294 Optimization for iterative queries on MapReduce 2014 VLDB 4.773119e-05
8,617 A Spark Optimizer for Adaptive, Fine-Grained Parameter Tuning 2024 VLDB 4.4846425e-05
8,978 SpongeFiles: Mitigating Data Skew in MapReduce Using Distributed Memory 2014 SIGMOD 4.417225e-05
9,375 Efficient Big Data Processing in Hadoop MapReduce 2012 VLDB 4.347384e-05
9,504 Supporting Scalable Analytics with Latency Constraints 2015 VLDB 4.3341665e-05
11,976 Anti-Combining for MapReduce 2014 SIGMOD 4.1945683e-05
12,203 Resiliency-Aware Data Management 2011 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 5 of 5 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers