Database Paper Browser

Back to papers

SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets

Summary: SCOPE is a declarative, extensible scripting language for data analysis on clusters, hiding parallelism from users. It provides SQL-like modeling with joins and aggregates, plus user-defined operators (extractors, processors, reducers, combiners), nesting, and stepwise plans compiled into parallel execution. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
9752
Venue
VLDB
Year
2008
Pagerank
0.0008456613
Overall Rank
22 | 99.85%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 50 of 89 citing papers.

Rank Citing Paper Year Venue Pagerank
42 A Comparison of Approaches to Large-Scale Data Analysis 2009 SIGMOD 0.00073498298
70 Hive - A Warehousing Solution Over a Map-Reduce Framework 2009 VLDB 0.00059533166
109 Dremel: Interactive Analysis of Web-Scale Datasets 2010 VLDB 0.00048186983
157 HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads 2009 VLDB 0.00040397359
329 Accelerating Machine Learning Inference with Probabilistic Predicates 2018 SIGMOD 0.00027249545
542 Shark: SQL and Rich Analytics at Scale 2013 SIGMOD 0.00020595648
660 Large Graph Processing in the Cloud 2010 SIGMOD 0.00018493984
711 A Case for A Collaborative Query Management System 2009 CIDR 0.00017751589
780 Building a High-Level Dataflow System on top of Map-Reduce: The Pig Experience 2009 VLDB 0.00016775082
794 Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing) 2010 VLDB 0.00016605103
886 Fast Personalized PageRank on MapReduce 2011 SIGMOD 0.00015597161
913 Tenzing A SQL Implementation On The MapReduce Framework 2011 VLDB 0.00015408131
947 MRShare: Sharing Across Multiple Queries in MapReduce 2010 VLDB 0.00015114576
960 A Comparison of Join Algorithms for Log Processing in MapReduce 2010 SIGMOD 0.00015012242
1,098 Trill: A High-Performance Incremental Query Processor for Diverse Analytics 2015 VLDB 0.00014114442
1,110 Parallel Evaluation of Conjunctive Queries 2011 PODS 0.00013968198
1,158 Simulation of Database-Valued Markov Chains Using SimSQL 2013 SIGMOD 0.0001361064
1,261 Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce 2013 VLDB 0.00012989236
1,265 Jaql: A Scripting Language for Large Scale Semistructured Data Analysis 2011 VLDB 0.00012947629
1,323 Quickr: Lazily Approximating Complex AdHoc Queries in BigData Clusters 2016 SIGMOD 0.00012601997
1,355 SQL/MapReduce: A practical approach to self-describing, polymorphic, and parallelizable user-defined functions 2009 VLDB 0.00012404572
1,534 PerfXplain: Debugging MapReduce Job Performance 2012 VLDB 0.00011468393
1,574 Approximate Query Processing: No Silver Bullet 2017 SIGMOD 0.00011287495
1,615 The Performance of MapReduce: An In-depth Study 2010 VLDB 0.00011132319
1,721 Distributed Data-Parallel Computing Using a High-Level Programming Language 2009 SIGMOD 0.00010762918
1,846 Combining User Interaction, Speculative Query Execution and Sampling in the DICE System 2014 VLDB 0.00010335419
1,873 An Architecture for Compiling UDF-centric Workflows 2015 VLDB 0.00010253002
2,083 Towards a Learning Optimizer for Shared Clouds 2019 VLDB 9.5834572e-05
2,172 Spinning Fast Iterative Data Flows 2012 VLDB 9.3706587e-05
2,249 Orca: A Modular Query Optimizer Architecture for Big Data 2014 SIGMOD 9.2034693e-05
2,413 Automated Partitioning Design in Parallel Database Systems 2011 SIGMOD 8.8672223e-05
2,418 Tupleware: "Big" Data, Big Analytics, Small Clusters 2015 CIDR 8.8556595e-05
2,476 A Platform for Scalable One-Pass Analytics using MapReduce 2011 SIGMOD 8.6960139e-05
2,545 POLARIS: The Distributed SQL Engine in Azure Synapse 2020 VLDB 8.5725413e-05
2,611 Opening the Black Boxes in Data Flow Optimization 2012 VLDB 8.4536967e-05
2,674 Minimal MapReduce Algorithms 2013 SIGMOD 8.3328645e-05
2,817 Recurring Job Optimization in Scope 2012 SIGMOD 8.0677653e-05
2,818 Implicit Parallelism through Deep Language Embedding 2015 SIGMOD 8.0665558e-05
3,038 Azure Data Lake Store: A Hyperscale Distributed File Service for Big Data Analytics 2017 SIGMOD 7.6717218e-05
3,141 ClusterJoin: A Similarity Joins Framework using Map-Reduce 2014 VLDB 7.4829448e-05
3,348 Lero: A Learning-to-Rank Query Optimizer 2023 VLDB 7.1904529e-05
3,517 Integrating Hadoop and Parallel DBMS 2010 SIGMOD 7.0199423e-05
3,550 Chi: A Scalable and Programmable Control Plane for Distributed Stream Processing Systems 2018 VLDB 6.9843512e-05
3,625 Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings 2020 SIGMOD 6.9055212e-05
3,922 Pushing Data-Induced Predicates Through Joins in Big-Data Clusters 2020 VLDB 6.6291079e-05
4,061 Advanced Partitioning Techniques for Massively Distributed Computation 2012 SIGMOD 6.483587e-05
4,174 Computation Reuse in Analytics Job Service at Microsoft 2018 SIGMOD 6.3856219e-05
4,248 Hyper Dimension Shuffle: Efficient Data Repartition at Petabyte Scale in SCOPE 2019 VLDB 6.3247927e-05
4,689 Algorithmic Aspects of Parallel Query Processing 2018 SIGMOD 5.9980099e-05
4,690 Deploying a Steered Query Optimizer in Production at Microsoft 2022 SIGMOD 5.997226e-05
Previous Page 1 / 2 Next

Outgoing Citations (Sorted by Pagerank)

Showing 2 of 2 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
3 Pig Latin: A Not-So-Foreign Language for Data Processing 2008 SIGMOD 0.0024183614
15 Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters 2007 SIGMOD 0.0010654262
Previous Page 1 / 1 Next

Semantically Similar Papers