Back to papers
Magpie: Python at Speed and Scale using Cloud Backends
Summary: Magpie exposes the Pandas API but lazily pushes dataframe work into cloud query engines (SQL DW, Spark, SCOPE) through a common data layer, avoiding cross-engine transfer and leveraging DB-grade features. It auto-selects optimal backends to deliver database-scale performance to Python analytics; production traces show ~25% of internal computations could benefit.
(summarized by gpt-5-mini on Feb 09 2026)
- Paper ID
- 407
- Venue
- CIDR
- Year
- 2021
- Pagerank
- 7.8262582e-05
- Overall Rank
- 2,954 | 79.46%
- DOI
-
-
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 16 of 16 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 3,393 |
Lux: Always-on Visualization Recommendations for Exploratory Dataframe Workflows |
2022 |
VLDB |
7.1483239e-05 |
| 3,407 |
End-to-end Optimization of Machine Learning Prediction Queries |
2022 |
SIGMOD |
7.1295646e-05 |
| 3,763 |
Flexible Rule-Based Decomposition and Metadata Independence in Modin: A Parallel Dataframe System |
2022 |
VLDB |
6.7801795e-05 |
| 4,773 |
PolyFrame: A Retargetable Query-based Approach to Scaling Dataframes |
2021 |
VLDB |
5.9320139e-05 |
| 5,731 |
Babelfish: Efficient Execution of Polyglot Queries |
2022 |
VLDB |
5.3502065e-05 |
| 6,261 |
The Cosmos Big Data Platform at Microsoft: Over a Decade of Progress and a Decade to Look Forward |
2021 |
VLDB |
5.1350714e-05 |
| 6,541 |
ConnectorX: Accelerating Data Loading From Databases to Dataframes |
2022 |
VLDB |
5.0216945e-05 |
| 6,701 |
YeSQL: “You extend SQL” with Rich and Highly Performant User-Defined Functions in Relational Databases |
2022 |
VLDB |
4.9561066e-05 |
| 6,895 |
Decentralized Actor Scheduling and Reference-based Storage in Xorbits: a Native Scalable Data Science Engine |
2025 |
VLDB |
4.8925595e-05 |
| 8,583 |
Efficient Execution of User-Defined Functions in SQL Queries |
2023 |
VLDB |
4.4919445e-05 |
| 8,645 |
Predicate Pushdown for Data Science Pipelines |
2023 |
SIGMOD |
4.4772518e-05 |
| 9,343 |
The Key to Effective UDF Optimization: Before Inlining, First Perform Outlining |
2025 |
VLDB |
4.3546206e-05 |
| 9,762 |
QURE: AI-Assisted and Automatically Verified UDF Inlining |
2025 |
SIGMOD |
4.2856106e-05 |
| 9,911 |
Dias: Dynamic Rewriting of Pandas Code |
2024 |
SIGMOD |
4.2565279e-05 |
| 10,931 |
Proactive Resume and Pause of Resources for Microsoft Azure SQL Database Serverless |
2024 |
SIGMOD |
4.1945683e-05 |
| 11,024 |
SplitDF: Splitting Dataframes for Memory-Efficient Data Analysis |
2024 |
VLDB |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 14 of 14 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 1,108 |
Froid: Optimization of Imperative Programs in a Relational Database |
2018 |
VLDB |
0.00013984276 |
| 1,427 |
Towards Scalable Dataframe Systems |
2020 |
VLDB |
0.0001204248 |
| 1,456 |
Rewriting Procedures for Batched Bindings |
2008 |
VLDB |
0.00011899772 |
| 1,630 |
Garlic: A New Flavor of Federated Query Processing for DB2 |
2002 |
SIGMOD |
0.0001108111 |
| 1,855 |
AI Meets AI: Leveraging Query Executions to Improve Index Recommendations |
2019 |
SIGMOD |
0.00010315245 |
| 2,934 |
AIDA - Abstraction for Advanced In-Database Analytics |
2018 |
VLDB |
7.8595778e-05 |
| 3,038 |
Azure Data Lake Store: A Hyperscale Distributed File Service for Big Data Analytics |
2017 |
SIGMOD |
7.6717218e-05 |
| 3,296 |
Extracting Equivalent SQL from Imperative Code in Database Applications |
2016 |
SIGMOD |
7.2596583e-05 |
| 3,308 |
Automatic Partitioning of Database Applications |
2012 |
VLDB |
7.2422925e-05 |
| 3,625 |
Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings |
2020 |
SIGMOD |
6.9055212e-05 |
| 3,982 |
The Myria Big Data Management and Analytics System and Cloud Service |
2017 |
CIDR |
6.5651188e-05 |
| 4,166 |
Sloth: Being Lazy is a Virtue (When Issuing Database Queries) |
2014 |
SIGMOD |
6.391976e-05 |
| 7,047 |
Seagull: An Infrastructure for Load Prediction and Optimized Resource Allocation |
2021 |
VLDB |
4.8521181e-05 |
| 7,448 |
DBridge: Translating Imperative Code to SQL |
2017 |
SIGMOD |
4.7273104e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 2,067 |
HippogriffDB: Balancing I/O and GPU Bandwidth in Big Data Analytics |
2016 |
VLDB |
9.6392739e-05 |
| 6,604 |
MotherDuck: DuckDB in the cloud and in the client |
2024 |
CIDR |
4.9971118e-05 |
| 4,773 |
PolyFrame: A Retargetable Query-based Approach to Scaling Dataframes |
2021 |
VLDB |
5.9320139e-05 |
| 1,882 |
Tuplex: Data Science in Python at Native Code Speed |
2021 |
SIGMOD |
0.0001021625 |
| 4,717 |
Cloud Analytics Benchmark |
2023 |
VLDB |
5.9751539e-05 |
| 6,189 |
Accelerating Python UDFs in Vectorized Query Execution |
2022 |
CIDR |
5.1647573e-05 |
| 9,416 |
When sweet and cute isn't enough anymore: Solving scalability issues in Python Pandas with Grizzly |
2020 |
CIDR |
4.3441378e-05 |
| 4,813 |
Putting Pandas in a Box |
2021 |
CIDR |
5.9049746e-05 |
| 1,427 |
Towards Scalable Dataframe Systems |
2020 |
VLDB |
0.0001204248 |
| 1,326 |
Starling: A Scalable Query Engine on Cloud Functions |
2020 |
SIGMOD |
0.00012576952 |