Database Paper Browser

Back to papers

Pig Latin: A Not-So-Foreign Language for Data Processing

Summary: Pig Latin sits between SQL and MapReduce, enabling procedural analysts to express data flows without MapReduce coding. Pig compiles Pig Latin to Hadoop MapReduce plans and offers an integrated debugger; open-source under Apache Incubator with Yahoo-scale deployments. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
4057
Venue
SIGMOD
Year
2008
Pagerank
0.0024183614
Overall Rank
3 | 99.99%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 50 of 154 citing papers.

Rank Citing Paper Year Venue Pagerank
4 Pregel: A System for Large-Scale Graph Processing 2010 SIGMOD 0.0019005923
22 SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets 2008 VLDB 0.0008456613
42 A Comparison of Approaches to Large-Scale Data Analysis 2009 SIGMOD 0.00073498298
53 PNUTS: Yahoo!'s Hosted Data Serving Platform 2008 VLDB 0.00066144767
66 Spark SQL: Relational Data Processing in Spark 2015 SIGMOD 0.00061639801
109 Dremel: Interactive Analysis of Web-Scale Datasets 2010 VLDB 0.00048186983
157 HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads 2009 VLDB 0.00040397359
396 One Trillion Edges: Graph Processing at Facebook-Scale 2015 VLDB 0.00024424102
413 HaLoop: Efficient Iterative Data Processing on Large Clusters 2010 VLDB 0.00023904409
543 MLbase: A Distributed Machine-learning System 2013 CIDR 0.00020526854
544 Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources 2018 SIGMOD 0.00020521965
711 A Case for A Collaborative Query Management System 2009 CIDR 0.00017751589
780 Building a High-Level Dataflow System on top of Map-Reduce: The Pig Experience 2009 VLDB 0.00016775082
794 Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing) 2010 VLDB 0.00016605103
886 Fast Personalized PageRank on MapReduce 2011 SIGMOD 0.00015597161
913 Tenzing A SQL Implementation On The MapReduce Framework 2011 VLDB 0.00015408131
947 MRShare: Sharing Across Multiple Queries in MapReduce 2010 VLDB 0.00015114576
960 A Comparison of Join Algorithms for Log Processing in MapReduce 2010 SIGMOD 0.00015012242
1,074 Processing Theta-Joins using MapReduce* 2011 SIGMOD 0.00014260096
1,076 RIOT: I/O-Efficient Numerical Computing without SQL 2009 CIDR 0.00014248449
1,261 Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce 2013 VLDB 0.00012989236
1,265 Jaql: A Scripting Language for Large Scale Semistructured Data Analysis 2011 VLDB 0.00012947629
1,280 Automatic Optimization for MapReduce Programs 2011 VLDB 0.0001285503
1,294 Distributed SociaLite: A Datalog-Based Language for Large-Scale Graph Analysis 2013 VLDB 0.00012779484
1,323 Quickr: Lazily Approximating Complex AdHoc Queries in BigData Clusters 2016 SIGMOD 0.00012601997
1,355 SQL/MapReduce: A practical approach to self-describing, polymorphic, and parallelizable user-defined functions 2009 VLDB 0.00012404572
1,402 Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML 2014 VLDB 0.00012180605
1,411 Communication Steps for Parallel Query Processing 2013 PODS 0.0001212565
1,435 Simba: Efficient In-Memory Spatial Analytics 2016 SIGMOD 0.00012004456
1,440 Provenance for Generalized Map and Reduce Workflows 2011 CIDR 0.00011961469
1,459 Query From Examples: An Iterative, Data-Driven Approach to Query Construction 2015 VLDB 0.00011889802
1,495 Ricardo: Integrating R and Hadoop 2010 SIGMOD 0.00011691049
1,534 PerfXplain: Debugging MapReduce Job Performance 2012 VLDB 0.00011468393
1,574 Approximate Query Processing: No Silver Bullet 2017 SIGMOD 0.00011287495
1,615 The Performance of MapReduce: An In-depth Study 2010 VLDB 0.00011132319
1,666 HELIX: Holistic Optimization for Accelerating Iterative Machine Learning 2019 VLDB 0.0001096361
1,715 V-SMART-Join: A Scalable MapReduce Framework for All-Pair Similarity Joins of Multisets and Vectors 2012 VLDB 0.00010803271
1,721 Distributed Data-Parallel Computing Using a High-Level Programming Language 2009 SIGMOD 0.00010762918
1,727 BigBench: Towards an Industry Standard Benchmark for Big Data Analytics 2013 SIGMOD 0.00010740936
1,770 ParaTimer: A Progress Indicator for MapReduce DAGs 2010 SIGMOD 0.00010618229
1,794 Summingbird: A Framework for Integrating Batch and Online MapReduce Computations 2014 VLDB 0.00010532024
1,846 Combining User Interaction, Speculative Query Execution and Sampling in the DICE System 2014 VLDB 0.00010335419
2,021 Storage Management in AsterixDB 2014 VLDB 9.7601304e-05
2,027 Titian: Data Provenance Support in Spark 2016 VLDB 9.7437067e-05
2,028 Putting Lipstick on Pig: Enabling Database-style Workflow Provenance 2012 VLDB 9.7433981e-05
2,035 Generating Example Data for Dataflow Programs 2009 SIGMOD 9.7149269e-05
2,205 ReStore: Reusing Results of MapReduce Jobs 2012 VLDB 9.2920002e-05
2,208 Clustera: An Integrated Computation And Data Management System 2008 VLDB 9.2873257e-05
2,212 Skew in Parallel Query Processing 2014 PODS 9.2771827e-05
2,300 A Demonstration of SpatialHadoop: An Efficient MapReduce Framework for Spatial Data 2013 VLDB 9.0677864e-05
Previous Page 1 / 4 Next

Outgoing Citations (Sorted by Pagerank)

Showing 2 of 2 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
15 Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters 2007 SIGMOD 0.0010654262
18 On Random Sampling over Joins 1999 SIGMOD 0.00092385438
Previous Page 1 / 1 Next

Semantically Similar Papers