Database Paper Browser

Back to papers

Towards Scalable Dataframe Systems

Summary: Scalable dataframe systems via MODIN; scaling pandas-like APIs with a simple dataframe data model and algebra. Signature features: flexible schemas, ordering, row/column equivalence, data/metadata fluidity; a trial-and-error interaction model spurs open data-management research. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
12097
Venue
VLDB
Year
2020
Pagerank
0.0001204248
Overall Rank
1,427 | 90.08%
DOI
10.14778/3407790.3407807

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 21 of 21 citing papers.

Rank Citing Paper Year Venue Pagerank
2,121 Balsa: Learning a Query Optimizer Without Expert Demonstrations 2022 SIGMOD 9.5017232e-05
2,954 Magpie: Python at Speed and Scale using Cloud Backends 2021 CIDR 7.8262582e-05
3,254 Query Processing on Tensor Computation Runtimes 2022 VLDB 7.3161051e-05
3,393 Lux: Always-on Visualization Recommendations for Exploratory Dataframe Workflows 2022 VLDB 7.1483239e-05
3,763 Flexible Rule-Based Decomposition and Metadata Independence in Modin: A Parallel Dataframe System 2022 VLDB 6.7801795e-05
4,239 The Composable Data Management System Manifesto 2023 VLDB 6.3318452e-05
4,773 PolyFrame: A Retargetable Query-based Approach to Scaling Dataframes 2021 VLDB 5.9320139e-05
5,307 A Critique of Modern SQL And A Proposal Towards A Simple and Expressive Query Language 2024 CIDR 5.5766594e-05
5,981 DataPrep.EDA: Task-Centric Exploratory Data Analysis for Statistical Modeling in Python 2021 SIGMOD 5.2448986e-05
6,541 ConnectorX: Accelerating Data Loading From Databases to Dataframes 2022 VLDB 5.0216945e-05
6,895 Decentralized Actor Scheduling and Reference-based Storage in Xorbits: a Native Scalable Data Science Engine 2025 VLDB 4.8925595e-05
8,163 Capturing and Querying Fine-grained Provenance of Preprocessing Pipelines in Data Science 2021 VLDB 4.5723431e-05
8,257 Automating and Optimizing Data-Centric What-If Analyses on Native Machine Learning Pipelines 2023 SIGMOD 4.5487511e-05
8,514 UPLIFT: Parallelization Strategies for Feature Transformations in Machine Learning Workloads 2022 VLDB 4.4944285e-05
8,915 DQDF: Data-Quality-Aware Dataframes 2022 VLDB 4.427232e-05
9,912 ElasticNotebook: Enabling Live Migration for Computational Notebooks 2024 VLDB 4.2565279e-05
10,482 Fast and Scalable Data Transfer Across Data Systems 2025 SIGMOD 4.1945683e-05
10,591 Accio: Bolt-on Query Federation 2025 VLDB 4.1945683e-05
11,024 SplitDF: Splitting Dataframes for Memory-Efficient Data Analysis 2024 VLDB 4.1945683e-05
11,396 DPDS: Assisting Data Science with Data Provenance 2022 VLDB 4.1945683e-05
11,429 Leam: An Interactive System for In-situ Visual Text Analysis 2021 CIDR 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 24 of 24 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
14 Online Aggregation 1997 SIGMOD 0.0010801504
66 Spark SQL: Relational Data Processing in Spark 2015 SIGMOD 0.00061639801
112 Potter's Wheel: An Interactive Data Cleaning System 2001 VLDB 0.00047045036
179 Efficient and Extensible Algorithms for Multi Query Optimization 2000 SIGMOD 0.00037672155
185 DuckDB: an Embeddable Analytical Database 2019 SIGMOD 0.00036538405
515 QPipe: A Simultaneously Pipelined Relational Query Engine 2005 SIGMOD 0.00021214633
940 SharedDB: Killing One Thousand Queries With One Stone 2012 VLDB 0.00015173166
1,203 PIVOT and UNPIVOT: Optimization and Execution Strategies in an RDBMS 2004 VLDB 0.00013320373
1,204 VerdictDB: Universalizing Approximate Query Processing 2018 SIGMOD 0.00013319541
1,219 Rate-Based Query Optimization for Streaming Information Sources 2002 SIGMOD 0.00013223888
1,233 Maximizing the Output Rate of Multi-Way Join Queries over Streaming Information Sources 2003 VLDB 0.0001313363
1,383 Querying XML Views of Relational Data 2001 VLDB 0.00012270434
1,422 SchemaSQL - A Language for Interoperability in Relational Multi-database Systems 1996 VLDB 0.00012056887
1,666 HELIX: Holistic Optimization for Accelerating Iterative Machine Learning 2019 VLDB 0.0001096361
1,900 Hash joins and hash teams in Microsoft SQL Server 1998 VLDB 0.000101645
2,011 Rapid Sampling for Visualizations with Ordering Guarantees 2015 VLDB 9.7964875e-05
2,097 Predictive Interaction for Data Transformation 2015 CIDR 9.5489822e-05
2,365 The Analytical Bootstrap: a New Method for Fast Error Estimation in Approximate Query Processing 2014 SIGMOD 8.9551432e-05
2,580 Sample + Seek: Approximating Aggregates with Distribution Precision Guarantee 2016 SIGMOD 8.5058814e-05
4,681 Adaptive Sampling for Rapidly Matching Histograms 2018 VLDB 6.0034918e-05
4,811 OQL: A Query Language for Manipulating Object-oriented Databases 1989 VLDB 5.9061974e-05
5,662 Query Unnesting in Object-Oriented Databases 1998 SIGMOD 5.3838456e-05
6,508 DataSpread: Unifying Databases and Spreadsheets 2015 VLDB 5.0335028e-05
6,822 Skimmer: Rapid Scrolling of Relational Query Results 2012 SIGMOD 4.9152454e-05
Previous Page 1 / 1 Next

Semantically Similar Papers