Database Paper Browser

Back to papers

The Cosmos Big Data Platform at Microsoft: Over a Decade of Progress and a Decade to Look Forward

Summary: Cosmos' exabyte-scale evolution at Microsoft spans reliability, scale, efficiency, and usability, with next steps toward security, compliance, and heterogeneous analytics. The paper links Cosmos workload evolution to broad big-data trends, offering platform-driven design insights for researchers. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
12519
Venue
VLDB
Year
2021
Pagerank
5.1350714e-05
Overall Rank
6,261 | 56.45%
DOI
10.14778/3476311.3476390

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 10 of 10 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 32 of 32 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
183 Automatic Database Management System Tuning Through Large-scale Machine Learning 2017 SIGMOD 0.00036721403
329 Accelerating Machine Learning Inference with Probabilistic Predicates 2018 SIGMOD 0.00027249545
746 Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores 2020 VLDB 0.00017326979
801 SageDB: A Learned Database System 2019 CIDR 0.00016505496
1,071 Starfish: A Self-tuning System for Big Data Analytics 2011 CIDR 0.00014312777
1,098 Trill: A High-Performance Incremental Query Processor for Diverse Analytics 2015 VLDB 0.00014114442
1,323 Quickr: Lazily Approximating Complex AdHoc Queries in BigData Clusters 2016 SIGMOD 0.00012601997
1,377 Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics 2021 CIDR 0.00012296941
1,922 Selecting Subexpressions to Materialize at Datacenter Scale 2018 VLDB 0.00010082599
2,083 Towards a Learning Optimizer for Shared Clouds 2019 VLDB 9.5834572e-05
2,413 Automated Partitioning Design in Parallel Database Systems 2011 SIGMOD 8.8672223e-05
2,658 Data Warehousing and Analytics Infrastructure at Facebook 2010 SIGMOD 8.3607429e-05
2,954 Magpie: Python at Speed and Scale using Cloud Backends 2021 CIDR 7.8262582e-05
3,038 Azure Data Lake Store: A Hyperscale Distributed File Service for Big Data Analytics 2017 SIGMOD 7.6717218e-05
3,141 ClusterJoin: A Similarity Joins Framework using Map-Reduce 2014 VLDB 7.4829448e-05
3,625 Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings 2020 SIGMOD 6.9055212e-05
3,875 Cloudy with High Chance of DBMS: A 10-year Prediction for Enterprise-Grade ML 2020 CIDR 6.675257e-05
3,922 Pushing Data-Induced Predicates Through Joins in Big-Data Clusters 2020 VLDB 6.6291079e-05
4,061 Advanced Partitioning Techniques for Massively Distributed Computation 2012 SIGMOD 6.483587e-05
4,174 Computation Reuse in Analytics Job Service at Microsoft 2018 SIGMOD 6.3856219e-05
4,248 Hyper Dimension Shuffle: Efficient Data Repartition at Petabyte Scale in SCOPE 2019 VLDB 6.3247927e-05
4,572 The Unified Logging Infrastructure for Data Analytics at Twitter 2012 VLDB 6.0760183e-05
4,857 The "Big Data" Ecosystem at LinkedIn 2013 SIGMOD 5.8736144e-05
5,252 Error-bounded Sampling for Analytics on Big Sparse Data 2014 VLDB 5.6024389e-05
5,361 Efficient Estimation of Inclusion Coefficient using HyperLogLog Sketches 2018 VLDB 5.547935e-05
6,242 Helios: Hyperscale Indexing for the Cloud & Edge 2020 VLDB 5.1408379e-05
6,673 Incorporating Super-Operators in Big-Data Query Optimizers 2020 VLDB 4.966799e-05
6,757 KEA: Tuning an Exabyte-Scale Data Infrastructure 2021 SIGMOD 4.9372134e-05
7,387 Bubble Execution: Resource-aware Reliable Analytics at Cloud Scale 2018 VLDB 4.7438193e-05
7,684 AutoToken: Predicting Peak Parallelism for Big Data Analytics at Microsoft 2020 VLDB 4.6796855e-05
8,240 Experiences with Approximating Queries in Microsoft’s Production Big-Data Clusters 2019 VLDB 4.5522563e-05
9,528 Winds from Seattle: Database Research Directions 2020 VLDB 4.3294231e-05
Previous Page 1 / 1 Next

Semantically Similar Papers