SplitDF: Splitting Dataframes for Memory-Efficient Data Analysis
Summary: Introduce “splitting”: lossless join decomposition adding join keys to reduce dataframe redundancy while preserving a unified tabular view, no FD discovery required. SplitDF (Ibis/DuckDB) with automated SplitGen (Velox) achieves 19–61% memory savings with minimal API change. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Authors
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 14 of 14 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,118 | AdaptDB: Adaptive Partitioning for Distributed Joins | 2017 | VLDB | 5.6820984e-05 |
| 8,275 | Adaptive Factorization Using Linear-Chained Hash Tables | 2025 | CIDR | 4.5439841e-05 |
| 2,443 | Data Management for Data Science: Towards Embedded Analytics | 2020 | CIDR | 8.8078476e-05 |
| 4,773 | PolyFrame: A Retargetable Query-based Approach to Scaling Dataframes | 2021 | VLDB | 5.9320139e-05 |
| 10,372 | Data Chunk Compaction in Vectorized Execution | 2025 | SIGMOD | 4.1945683e-05 |
| 3,763 | Flexible Rule-Based Decomposition and Metadata Independence in Modin: A Parallel Dataframe System | 2022 | VLDB | 6.7801795e-05 |
| 1,427 | Towards Scalable Dataframe Systems | 2020 | VLDB | 0.0001204248 |
| 10,635 | Saving Private Hash Join | 2025 | VLDB | 4.1945683e-05 |
| 6,541 | ConnectorX: Accelerating Data Loading From Databases to Dataframes | 2022 | VLDB | 5.0216945e-05 |
| 8,915 | DQDF: Data-Quality-Aware Dataframes | 2022 | VLDB | 4.427232e-05 |