Database Paper Browser

Back to papers

Pig Latin: A Not-So-Foreign Language for Data Processing

Summary: Pig Latin sits between SQL and MapReduce, enabling procedural analysts to express data flows without MapReduce coding. Pig compiles Pig Latin to Hadoop MapReduce plans and offers an integrated debugger; open-source under Apache Incubator with Yahoo-scale deployments. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
4057
Venue
SIGMOD
Year
2008
Pagerank
0.0024183614
Overall Rank
3 | 99.99%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 50 of 154 citing papers.

Rank Citing Paper Year Venue Pagerank
5,790 AQWA: Adaptive Query-Workload-Aware Partitioning of Big Spatial Data 2015 VLDB 5.3269734e-05
5,806 BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees 2019 SIGMOD 5.3200643e-05
5,838 HadoopDB in Action: Building Real World Applications 2010 SIGMOD 5.3059032e-05
5,903 Building Wavelet Histograms on Large Data in MapReduce 2012 VLDB 5.2791351e-05
5,980 The Era of Big Spatial Data 2017 VLDB 5.2449608e-05
6,117 REEF: Retainable Evaluator Execution Framework 2015 SIGMOD 5.2036631e-05
6,131 Fast Data in the Era of Big Data: Twitter's Real-Time Related Query Suggestion Architecture 2013 SIGMOD 5.1956688e-05
6,367 Good to the Last Bit: Data-Driven Encoding with CodecDB 2021 SIGMOD 5.0941072e-05
6,407 Just-In-Time Data Virtualization: Lightweight Data Management with ViDa 2015 CIDR 5.076547e-05
6,483 Towards Unified Ad-hoc Data Processing 2014 SIGMOD 5.0456397e-05
6,658 Scalable Querying of Nested Data 2021 VLDB 4.9711629e-05
6,821 Hadoop's Adolescence: An analysis of Hadoop usage in scientific workloads 2013 VLDB 4.9156923e-05
6,836 An Algebraic Approach for Data-Centric Scientific Workflows 2011 VLDB 4.9114673e-05
7,067 JetScope: Reliable and Interactive Analytics at Cloud Scale 2015 VLDB 4.8440936e-05
7,112 Wide Table Layout Optimization based on Column Ordering and Duplication 2017 SIGMOD 4.8275068e-05
7,198 BSMA: A Benchmark for Analytical Queries over Social Media Data 2014 VLDB 4.8033496e-05
7,207 Kodiak: Leveraging Materialized Views For Very Low-Latency Analytics Over High-Dimensional Web-Scale Data 2016 VLDB 4.800763e-05
7,264 Online Expansion of Large-scale Data Warehouses 2011 VLDB 4.7842311e-05
7,270 Oracle In-Database Hadoop: When MapReduce Meets RDBMS 2012 SIGMOD 4.7813984e-05
7,294 Optimization for iterative queries on MapReduce 2014 VLDB 4.773119e-05
7,534 Enabling Efficient and General Subpopulation Analytics in Multidimensional Data Streams 2022 VLDB 4.7180004e-05
7,818 A Survey and Experimental Comparison of Distributed SPARQL Engines for Very Large RDF Data 2017 VLDB 4.6434716e-05
7,877 Emerging Trends in the Enterprise Data Analytics: Connecting Hadoop and DB2 Warehouse 2011 SIGMOD 4.6297559e-05
7,902 Building Highly-Optimized, Low-Latency Pipelines for Genomic Data Analysis 2015 CIDR 4.6215911e-05
7,953 Shasta: Interactive Reporting At Scale 2016 SIGMOD 4.613363e-05
7,960 Building Community-Centric Information Exploration Applications on Social Content Sites 2009 SIGMOD 4.613363e-05
8,401 Toward Progress Indicators on Steroids for Big Data Systems 2013 CIDR 4.5250912e-05
8,429 Handling Environments in a Nested Relational Algebra with Combinators and an Implementation in a Verified Query Compiler 2017 SIGMOD 4.5156925e-05
8,464 Piranha: Optimizing Short Jobs in Hadoop 2013 VLDB 4.5052127e-05
8,790 From SPARQL to MapReduce: The Journey Using a Nested TripleGroup Algebra 2011 VLDB 4.4508494e-05
8,924 QMapper for Smart Grid: Migrating SQL-based Application to Hive 2015 SIGMOD 4.427232e-05
8,978 SpongeFiles: Mitigating Data Skew in MapReduce Using Distributed Memory 2014 SIGMOD 4.417225e-05
9,001 The Power of Nested Parallelism in Big Data Processing – Hitting Three Flies with One Slap – 2021 SIGMOD 4.4107627e-05
9,004 DataGarage: Warehousing Massive Performance Data on Commodity Servers 2010 VLDB 4.4102022e-05
9,347 Rank Join Queries in NoSQL Databases 2014 VLDB 4.3526718e-05
9,376 Versatile Optimization of UDF-heavy Data Flows with Sofa 2014 SIGMOD 4.347376e-05
9,519 PAXQuery: Parallel Analytical XML Processing 2015 SIGMOD 4.3323764e-05
9,613 Graft: A Debugging Tool For Apache Giraph 2015 SIGMOD 4.3177432e-05
11,197 QaaD (Query-as-a-Data): Scalable Execution of Massive Number of Small Queries in Spark 2023 SIGMOD 4.1945683e-05
11,213 Udon: Efficient Debugging of User-Defined Functions in Big Data Systems with Line-by-Line Control 2023 SIGMOD 4.1945683e-05
11,690 Integration of Large-Scale Data Processing Systems and Traditional Parallel Database Technology 2019 VLDB 4.1945683e-05
11,831 Logical Aspects of Massively Parallel and Distributed Systems 2016 PODS 4.1945683e-05
11,859 dmapply: A functional primitive to express distributed machine learning algorithms in R 2016 VLDB 4.1945683e-05
11,882 Parallel Evaluation of Multi-Semi-Joins 2016 VLDB 4.1945683e-05
11,890 Let's Rethink Join Optimization in Distributed Systems 2015 CIDR 4.1945683e-05
11,894 Building Highly-Optimized, Low-Latency Pipelines for Genomic Data Analysis 2015 CIDR 4.1945683e-05
11,916 A Demonstration of Rubato DB: A Highly Scalable NewSQL Database System for OLTP and Big Data Applications 2015 SIGMOD 4.1945683e-05
11,919 ShareInsights - An Unified Approach to Full-stack Data Processing 2015 SIGMOD 4.1945683e-05
11,976 Anti-Combining for MapReduce 2014 SIGMOD 4.1945683e-05
12,109 Declarative Error Management for Robust Data-Intensive Applications 2012 SIGMOD 4.1945683e-05
Previous Page 3 / 4 Next

Outgoing Citations (Sorted by Pagerank)

Showing 2 of 2 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
15 Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters 2007 SIGMOD 0.0010654262
18 On Random Sampling over Joins 1999 SIGMOD 0.00092385438
Previous Page 1 / 1 Next

Semantically Similar Papers