AutoToken: Predicting Peak Parallelism for Big Data Analytics at Microsoft
Summary: AutoToken predicts peak resource usage for recurring big-data queries in serverless analytics. A lightweight, scalable predictor using multiple query-plan identifiers to detect recurring templates, integrated with Peregrine and validated on SCOPE jobs. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Rathijit Sen
- 2. Alekh Jindal
- 3. Hiren Patel
- 4. Shi Qiao
Incoming Citations (Sorted by Pagerank)
Showing 10 of 10 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 7 of 7 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 22 | SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets | 2008 | VLDB | 0.0008456613 |
| 953 | Runtime Measurements in the Cloud: Observing, Analyzing, and Reducing Variance | 2010 | VLDB | 0.00015095431 |
| 1,071 | Starfish: A Self-tuning System for Big Data Analytics | 2011 | CIDR | 0.00014312777 |
| 2,083 | Towards a Learning Optimizer for Shared Clouds | 2019 | VLDB | 9.5834572e-05 |
| 3,625 | Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings | 2020 | SIGMOD | 6.9055212e-05 |
| 4,174 | Computation Reuse in Analytics Job Service at Microsoft | 2018 | SIGMOD | 6.3856219e-05 |
| 9,735 | SparkCruise: Handsfree Computation Reuse in Spark | 2019 | VLDB | 4.2942813e-05 |
Previous
Page 1 / 1
Next