Database Paper Browser

Back to papers

SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets

Summary: SCOPE is a declarative, extensible scripting language for data analysis on clusters, hiding parallelism from users. It provides SQL-like modeling with joins and aggregates, plus user-defined operators (extractors, processors, reducers, combiners), nesting, and stepwise plans compiled into parallel execution. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
9752
Venue
VLDB
Year
2008
Pagerank
0.0008456613
Overall Rank
22 | 99.85%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 39 of 89 citing papers.

Rank Citing Paper Year Venue Pagerank
4,822 Consistency and Completeness: Rethinking Distributed Stream Processing in Apache Kafka 2021 SIGMOD 5.8959131e-05
4,951 Mining Document Collections to Facilitate Accurate Approximate Entity Matching 2009 VLDB 5.8100413e-05
5,045 Massive Scale-out of Expensive Continuous Queries 2011 VLDB 5.740793e-05
5,096 Auto-Transform: Learning-to-Transform by Patterns 2020 VLDB 5.7011825e-05
5,252 Error-bounded Sampling for Analytics on Big Sparse Data 2014 VLDB 5.6024389e-05
5,257 Probabilistic Demand Forecasting at Scale 2017 VLDB 5.6003925e-05
5,361 Efficient Estimation of Inclusion Coefficient using HyperLogLog Sketches 2018 VLDB 5.547935e-05
5,794 Discovering Related Data At Scale 2021 VLDB 5.3245122e-05
5,903 Building Wavelet Histograms on Large Data in MapReduce 2012 VLDB 5.2791351e-05
6,040 Steering Query Optimizers: A Practical Take on Big Data Workloads 2021 SIGMOD 5.2412035e-05
6,209 AutoExecutor: Predictive Parallelism for Spark SQL Queries 2021 VLDB 5.1565972e-05
6,242 Helios: Hyperscale Indexing for the Cloud & Edge 2020 VLDB 5.1408379e-05
6,590 Interactive Demonstration of Probabilistic Predicates 2018 SIGMOD 5.0010949e-05
6,658 Scalable Querying of Nested Data 2021 VLDB 4.9711629e-05
6,757 KEA: Tuning an Exabyte-Scale Data Infrastructure 2021 SIGMOD 4.9372134e-05
6,885 PilotScope: Steering Databases with Machine Learning Drivers 2024 VLDB 4.895386e-05
7,387 Bubble Execution: Resource-aware Reliable Analytics at Cloud Scale 2018 VLDB 4.7438193e-05
7,534 Enabling Efficient and General Subpopulation Analytics in Multidimensional Data Streams 2022 VLDB 4.7180004e-05
7,684 AutoToken: Predicting Peak Parallelism for Big Data Analytics at Microsoft 2020 VLDB 4.6796855e-05
7,778 Runtime Variation in Big Data Analytics 2023 SIGMOD 4.653651e-05
7,833 Dependency-Driven Analytics: a Compass for Uncharted Data Oceans 2017 CIDR 4.6382648e-05
7,838 Auto-Validate: Unsupervised Data Validation Using Data-Domain Patterns Inferred from Data Lakes 2021 SIGMOD 4.6377995e-05
8,002 Pangea: Monolithic Distributed Storage for Data Analytics 2019 VLDB 4.6088289e-05
8,197 SparkCruise: Workload Optimization in Managed Spark Clusters at Microsoft 2021 VLDB 4.5607121e-05
8,217 Spur: Mitigating Slow Instances in Large-Scale Streaming Pipelines 2020 SIGMOD 4.5568298e-05
8,220 PerfGuard: Deploying ML-for-Systems without Performance Regressions, Almost! 2021 VLDB 4.5557328e-05
8,240 Experiences with Approximating Queries in Microsoft’s Production Big-Data Clusters 2019 VLDB 4.5522563e-05
8,582 Towards Query Optimizer as a Service (QOaaS) in a Unified LakeHouse Ecosystem: Can One QO Rule Them All? 2025 CIDR 4.492033e-05
8,818 Couchbase Analytics: NoETL for Scalable NoSQL Data Analysis 2019 VLDB 4.4427924e-05
9,004 DataGarage: Warehousing Massive Performance Data on Commodity Servers 2010 VLDB 4.4102022e-05
9,194 Phoebe: A Learning-based Checkpoint Optimizer 2021 VLDB 4.3761777e-05
9,330 Parallel Query Processing: To Separate Communication from Computation 2022 SIGMOD 4.3556432e-05
9,547 Optimistic Recovery for Iterative Dataflows in Action 2015 SIGMOD 4.3259935e-05
10,404 Dynamic Pruning for Recursive Joins 2025 SIGMOD 4.1945683e-05
10,932 Flux: Decoupled Auto-Scaling for Heterogeneous Query Workload in Alibaba AnalyticDB 2024 SIGMOD 4.1945683e-05
11,531 Fangorn: Adaptive Execution Framework for Heterogeneous Workloads on Shared Clusters 2021 VLDB 4.1945683e-05
11,690 Integration of Large-Scale Data Processing Systems and Traditional Parallel Database Technology 2019 VLDB 4.1945683e-05
12,109 Declarative Error Management for Robust Data-Intensive Applications 2012 SIGMOD 4.1945683e-05
12,226 Indexing Multi-dimensional Data in a Cloud System 2010 SIGMOD 4.1945683e-05
Previous Page 2 / 2 Next

Outgoing Citations (Sorted by Pagerank)

Showing 2 of 2 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
3 Pig Latin: A Not-So-Foreign Language for Data Processing 2008 SIGMOD 0.0024183614
15 Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters 2007 SIGMOD 0.0010654262
Previous Page 1 / 1 Next

Semantically Similar Papers