Pig Latin: A Not-So-Foreign Language for Data Processing

Summary: Pig Latin sits between SQL and MapReduce, enabling procedural analysts to express data flows without MapReduce coding. Pig compiles Pig Latin to Hadoop MapReduce plans and offers an integrated debugger; open-source under Apache Incubator with Yahoo-scale deployments. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID: 4057
Venue: SIGMOD
Year: 2008
Pagerank: 0.0024183614
Overall Rank: 3 | 99.99%
DOI: -

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 50 of 154 citing papers.

Rank	Citing Paper	Year	Venue	Pagerank
5,790	AQWA: Adaptive Query-Workload-Aware Partitioning of Big Spatial Data	2015	VLDB	5.3269734e-05
5,806	BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees	2019	SIGMOD	5.3200643e-05
5,838	HadoopDB in Action: Building Real World Applications	2010	SIGMOD	5.3059032e-05
5,903	Building Wavelet Histograms on Large Data in MapReduce	2012	VLDB	5.2791351e-05
5,980	The Era of Big Spatial Data	2017	VLDB	5.2449608e-05
6,117	REEF: Retainable Evaluator Execution Framework	2015	SIGMOD	5.2036631e-05
6,131	Fast Data in the Era of Big Data: Twitter's Real-Time Related Query Suggestion Architecture	2013	SIGMOD	5.1956688e-05
6,367	Good to the Last Bit: Data-Driven Encoding with CodecDB	2021	SIGMOD	5.0941072e-05
6,407	Just-In-Time Data Virtualization: Lightweight Data Management with ViDa	2015	CIDR	5.076547e-05
6,483	Towards Unified Ad-hoc Data Processing	2014	SIGMOD	5.0456397e-05
6,658	Scalable Querying of Nested Data	2021	VLDB	4.9711629e-05
6,821	Hadoop's Adolescence: An analysis of Hadoop usage in scientific workloads	2013	VLDB	4.9156923e-05
6,836	An Algebraic Approach for Data-Centric Scientific Workflows	2011	VLDB	4.9114673e-05
7,067	JetScope: Reliable and Interactive Analytics at Cloud Scale	2015	VLDB	4.8440936e-05
7,112	Wide Table Layout Optimization based on Column Ordering and Duplication	2017	SIGMOD	4.8275068e-05
7,198	BSMA: A Benchmark for Analytical Queries over Social Media Data	2014	VLDB	4.8033496e-05
7,207	Kodiak: Leveraging Materialized Views For Very Low-Latency Analytics Over High-Dimensional Web-Scale Data	2016	VLDB	4.800763e-05
7,264	Online Expansion of Large-scale Data Warehouses	2011	VLDB	4.7842311e-05
7,270	Oracle In-Database Hadoop: When MapReduce Meets RDBMS	2012	SIGMOD	4.7813984e-05
7,294	Optimization for iterative queries on MapReduce	2014	VLDB	4.773119e-05
7,534	Enabling Efficient and General Subpopulation Analytics in Multidimensional Data Streams	2022	VLDB	4.7180004e-05
7,818	A Survey and Experimental Comparison of Distributed SPARQL Engines for Very Large RDF Data	2017	VLDB	4.6434716e-05
7,877	Emerging Trends in the Enterprise Data Analytics: Connecting Hadoop and DB2 Warehouse	2011	SIGMOD	4.6297559e-05
7,902	Building Highly-Optimized, Low-Latency Pipelines for Genomic Data Analysis	2015	CIDR	4.6215911e-05
7,953	Shasta: Interactive Reporting At Scale	2016	SIGMOD	4.613363e-05
7,960	Building Community-Centric Information Exploration Applications on Social Content Sites	2009	SIGMOD	4.613363e-05
8,401	Toward Progress Indicators on Steroids for Big Data Systems	2013	CIDR	4.5250912e-05
8,429	Handling Environments in a Nested Relational Algebra with Combinators and an Implementation in a Verified Query Compiler	2017	SIGMOD	4.5156925e-05
8,464	Piranha: Optimizing Short Jobs in Hadoop	2013	VLDB	4.5052127e-05
8,790	From SPARQL to MapReduce: The Journey Using a Nested TripleGroup Algebra	2011	VLDB	4.4508494e-05
8,924	QMapper for Smart Grid: Migrating SQL-based Application to Hive	2015	SIGMOD	4.427232e-05
8,978	SpongeFiles: Mitigating Data Skew in MapReduce Using Distributed Memory	2014	SIGMOD	4.417225e-05
9,001	The Power of Nested Parallelism in Big Data Processing – Hitting Three Flies with One Slap –	2021	SIGMOD	4.4107627e-05
9,004	DataGarage: Warehousing Massive Performance Data on Commodity Servers	2010	VLDB	4.4102022e-05
9,347	Rank Join Queries in NoSQL Databases	2014	VLDB	4.3526718e-05
9,376	Versatile Optimization of UDF-heavy Data Flows with Sofa	2014	SIGMOD	4.347376e-05
9,519	PAXQuery: Parallel Analytical XML Processing	2015	SIGMOD	4.3323764e-05
9,613	Graft: A Debugging Tool For Apache Giraph	2015	SIGMOD	4.3177432e-05
11,197	QaaD (Query-as-a-Data): Scalable Execution of Massive Number of Small Queries in Spark	2023	SIGMOD	4.1945683e-05
11,213	Udon: Efficient Debugging of User-Defined Functions in Big Data Systems with Line-by-Line Control	2023	SIGMOD	4.1945683e-05
11,690	Integration of Large-Scale Data Processing Systems and Traditional Parallel Database Technology	2019	VLDB	4.1945683e-05
11,831	Logical Aspects of Massively Parallel and Distributed Systems	2016	PODS	4.1945683e-05
11,859	dmapply: A functional primitive to express distributed machine learning algorithms in R	2016	VLDB	4.1945683e-05
11,882	Parallel Evaluation of Multi-Semi-Joins	2016	VLDB	4.1945683e-05
11,890	Let's Rethink Join Optimization in Distributed Systems	2015	CIDR	4.1945683e-05
11,894	Building Highly-Optimized, Low-Latency Pipelines for Genomic Data Analysis	2015	CIDR	4.1945683e-05
11,916	A Demonstration of Rubato DB: A Highly Scalable NewSQL Database System for OLTP and Big Data Applications	2015	SIGMOD	4.1945683e-05
11,919	ShareInsights - An Unified Approach to Full-stack Data Processing	2015	SIGMOD	4.1945683e-05
11,976	Anti-Combining for MapReduce	2014	SIGMOD	4.1945683e-05
12,109	Declarative Error Management for Robust Data-Intensive Applications	2012	SIGMOD	4.1945683e-05

Outgoing Citations (Sorted by Pagerank)

Showing 2 of 2 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank	Cited Paper	Year	Venue	Pagerank
15	Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters	2007	SIGMOD	0.0010654262
18	On Random Sampling over Joins	1999	SIGMOD	0.00092385438

Semantically Similar Papers

Overall Rank	Paper	Year	Venue	Pagerank
8,790	From SPARQL to MapReduce: The Journey Using a Nested TripleGroup Algebra	2011	VLDB	4.4508494e-05
4,857	The "Big Data" Ecosystem at LinkedIn	2013	SIGMOD	5.8736144e-05
11,690	Integration of Large-Scale Data Processing Systems and Traditional Parallel Database Technology	2019	VLDB	4.1945683e-05
1,265	Jaql: A Scripting Language for Large Scale Semistructured Data Analysis	2011	VLDB	0.00012947629
12,125	ReStore: Reusing Results of MapReduce Jobs in Pig	2012	SIGMOD	4.1945683e-05
70	Hive - A Warehousing Solution Over a Map-Reduce Framework	2009	VLDB	0.00059533166
13,426	The Farm - where Pig Scripts are bred and raised	2013	SIGMOD	-
3,601	Large-Scale Machine Learning at Twitter	2012	SIGMOD	6.9315087e-05
4,425	Nova: Continuous Pig/Hadoop Workflows	2011	SIGMOD	6.198382e-05
780	Building a High-Level Dataflow System on top of Map-Reduce: The Pig Experience	2009	VLDB	0.00016775082