The Story of AWS Glue
Summary: Describes AWS Glue: a serverless ETL platform with fast-start, auto-scaling Spark/Python (custom resource manager), DynamicFrames for schema-free semi-structured data, and a shuffle plugin offloading shuffles to object storage. Includes a Hive-compatible Data Catalog with crawlers and Glue Studio visual ETL to simplify scalable, extensible data-lake ingestion. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Mohit Saxena
- 2. Benjamin Sowell
- 3. Daiyan Alamgir
- 4. Nitin Bahadur
- 5. Bijay Bisht
- 6. Santosh Chandrachood
- 7. Chitti Keswani
- 8. G2 Krishnamoorthy
- 9. Austin Lee
- 10. Bohou Li
- 11. Zach Mitchell
- 12. Vaibhav Porwal
- 13. Maheedhar Reddy Chappidi
- 14. Brian Ross
- 15. Noritaka Sekiyama
- 16. Omer Zaki
- 17. Linchi Zhang
- 18. Mehul A. Shah
Incoming Citations (Sorted by Pagerank)
Showing 2 of 2 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 10,767 | The HANA Native Query Engine for Lakehouse Systems | 2025 | VLDB | 4.1945683e-05 |
| 10,772 | veDB-HTAP: a Highly Integrated, Efficient and Adaptive HTAP System | 2025 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 5 of 5 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 109 | Dremel: Interactive Analysis of Web-Scale Datasets | 2010 | VLDB | 0.00048186983 |
| 156 | Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases | 2017 | SIGMOD | 0.00040504295 |
| 746 | Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores | 2020 | VLDB | 0.00017326979 |
| 1,284 | Amazon Redshift Re-invented | 2022 | SIGMOD | 0.00012837822 |
| 5,888 | Magnet: Push-based Shuffle Service for Large-scale Data Processing | 2020 | VLDB | 5.2873617e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3,973 | Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing | 2019 | SIGMOD | 6.5758017e-05 |
| 5,531 | Presto: A Decade of SQL Analytics at Meta | 2023 | SIGMOD | 5.4549499e-05 |
| 746 | Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores | 2020 | VLDB | 0.00017326979 |
| 11,389 | CDI-E: An Elastic Cloud Service for Data Engineering | 2022 | VLDB | 4.1945683e-05 |
| 11,668 | Cost-Effective, Workload-Adaptive Migration of Big Data Applications to the Cloud | 2019 | SIGMOD | 4.1945683e-05 |
| 10,889 | Off-the-shelf Data Analytics on Serverless | 2024 | CIDR | 4.1945683e-05 |
| 167 | The Snowflake Elastic Data Warehouse | 2016 | SIGMOD | 0.00039180521 |
| 1,284 | Amazon Redshift Re-invented | 2022 | SIGMOD | 0.00012837822 |
| 426 | Amazon Redshift and the Case for Simpler Data Warehouses | 2015 | SIGMOD | 0.00023594359 |
| 3,844 | The evolution of Amazon Redshift (extended abstract) | 2021 | VLDB | 6.7076451e-05 |