Database Paper Browser

Back to papers

Pig Latin: A Not-So-Foreign Language for Data Processing

Summary: Pig Latin sits between SQL and MapReduce, enabling procedural analysts to express data flows without MapReduce coding. Pig compiles Pig Latin to Hadoop MapReduce plans and offers an integrated debugger; open-source under Apache Incubator with Yahoo-scale deployments. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
4057
Venue
SIGMOD
Year
2008
Pagerank
0.0024183614
Overall Rank
3 | 99.99%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 50 of 154 citing papers.

Rank Citing Paper Year Venue Pagerank
2,338 Samza: Stateful Scalable Stream Processing at LinkedIn 2017 VLDB 9.00711e-05
2,419 Towards a One Size Fits All Database Architecture 2011 CIDR 8.853712e-05
2,439 CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop 2011 VLDB 8.8190594e-05
2,458 REX: Recursive, Delta-Based Data-Centric Computation 2012 VLDB 8.7683462e-05
2,476 A Platform for Scalable One-Pass Analytics using MapReduce 2011 SIGMOD 8.6960139e-05
2,575 A Latency and Fault-Tolerance Optimizer for Online Parallel Query Plans 2011 SIGMOD 8.5133576e-05
2,611 Opening the Black Boxes in Data Flow Optimization 2012 VLDB 8.4536967e-05
2,667 Cumulon: Optimizing Statistical Data Analysis in the Cloud 2013 SIGMOD 8.3413995e-05
2,674 Minimal MapReduce Algorithms 2013 SIGMOD 8.3328645e-05
2,736 Online Aggregation and Continuous Query support in MapReduce 2010 SIGMOD 8.2043187e-05
2,818 Implicit Parallelism through Deep Language Embedding 2015 SIGMOD 8.0665558e-05
2,928 WANalytics: Analytics for a Geo-Distributed Data-Intensive World 2015 CIDR 7.8812874e-05
2,946 BigDansing: A System for Big Data Cleansing 2015 SIGMOD 7.8372441e-05
3,066 HAWQ: A Massively Parallel Processing SQL Engine in Hadoop 2014 SIGMOD 7.6221974e-05
3,208 Column-Oriented Storage Techniques for MapReduce 2011 VLDB 7.3781897e-05
3,265 RHEEM: Enabling Cross-Platform Data Processing - May The Big Data Be With You! - 2018 VLDB 7.3083672e-05
3,279 Early Accurate Results for Advanced Analytics on MapReduce 2012 VLDB 7.2855494e-05
3,355 F1 Query: Declarative Querying at Scale 2018 VLDB 7.1829142e-05
3,375 Query Shredding: Efficient Relational Evaluation of Queries over Nested Multisets 2014 SIGMOD 7.1633324e-05
3,377 Demonstration of the Myria Big Data Management Service 2014 SIGMOD 7.1624478e-05
3,455 A Comparison of Platforms for Implementing and Running Very Large Scale Machine Learning Algorithms 2014 SIGMOD 7.0771839e-05
3,504 M3R: Increased Performance for In-Memory Hadoop Jobs 2012 VLDB 7.0347515e-05
3,517 Integrating Hadoop and Parallel DBMS 2010 SIGMOD 7.0199423e-05
3,548 Adaptive Query Processing on RAW Data 2014 VLDB 6.9859242e-05
3,601 Large-Scale Machine Learning at Twitter 2012 SIGMOD 6.9315087e-05
3,700 RAMP: A System for Capturing and Tracing Provenance in MapReduce Workflows 2011 VLDB 6.8307955e-05
3,703 Multi-Query Optimization in MapReduce Framework 2014 VLDB 6.8289978e-05
3,710 Optimizing Analytic Data Flows for Multiple Execution Engines 2012 SIGMOD 6.8238962e-05
3,922 Pushing Data-Induced Predicates Through Joins in Big-Data Clusters 2020 VLDB 6.6291079e-05
4,046 WANalytics: Geo-Distributed Analytics for a Data Intensive World 2015 SIGMOD 6.4979392e-05
4,061 Advanced Partitioning Techniques for Massively Distributed Computation 2012 SIGMOD 6.483587e-05
4,132 Advanced Join Strategies for Large-Scale Distributed Computation 2014 VLDB 6.4241067e-05
4,188 Apache Tez: A Unifying Framework for Modeling and Building Data Processing Applications 2015 SIGMOD 6.3753681e-05
4,201 Meet Charles, big data query advisor 2013 CIDR 6.3639451e-05
4,326 Fast Queries Over Heterogeneous Data Through Engine Customization 2016 VLDB 6.288323e-05
4,425 Nova: Continuous Pig/Hadoop Workflows 2011 SIGMOD 6.198382e-05
4,493 ASTERIX: An Open Source System for "Big Data" Management and Analysis (Demo) 2012 VLDB 6.141595e-05
4,572 The Unified Logging Infrastructure for Data Analytics at Twitter 2012 VLDB 6.0760183e-05
4,689 Algorithmic Aspects of Parallel Query Processing 2018 SIGMOD 5.9980099e-05
4,700 Schedule Optimization for Data Processing Flows on the Cloud 2011 SIGMOD 5.9882572e-05
4,774 LIMA: Fine-grained Lineage Tracing and Reuse in Machine Learning Systems 2021 SIGMOD 5.9316087e-05
4,857 The "Big Data" Ecosystem at LinkedIn 2013 SIGMOD 5.8736144e-05
4,885 GraphJet: Real-Time Content Recommendations at Twitter 2016 VLDB 5.8534354e-05
5,030 Nanosecond Indexing of Graph Data With Hash Maps and VLists 2019 SIGMOD 5.7501994e-05
5,125 The Art of Balance: A RateupDBTM Experience of Building a CPU/GPU Hybrid Database Product 2021 VLDB 5.679423e-05
5,294 GLADE: Big Data Analytics Made Easy 2012 SIGMOD 5.5810654e-05
5,297 Continuous Cloud-Scale Query Optimization and Processing 2013 VLDB 5.5801669e-05
5,301 ReCache: Reactive Caching for Fast Analytics over Heterogeneous Data 2018 VLDB 5.5790928e-05
5,453 Semistructured Models, Queries and Algebras in the Big Data Era 2016 SIGMOD 5.4989459e-05
5,731 Babelfish: Efficient Execution of Polyglot Queries 2022 VLDB 5.3502065e-05
Previous Page 2 / 4 Next

Outgoing Citations (Sorted by Pagerank)

Showing 2 of 2 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
15 Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters 2007 SIGMOD 0.0010654262
18 On Random Sampling over Joins 1999 SIGMOD 0.00092385438
Previous Page 1 / 1 Next

Semantically Similar Papers