Principles of Dataset Versioning: Exploring the Recreation/Storage Tradeoff
Summary: Principled dataset versioning, analyzing the storage-recreation trade-off and six problem settings. Demonstrates intractability for many cases and offers heuristics from delay-constrained scheduling and spanning-tree methods, plus a DATAHUB prototype. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Souvik Bhattacherjee
- 2. Amit Chavan
- 3. Silu Huang
- 4. Amol Deshpande
- 5. Aditya Parameswaran
Incoming Citations (Sorted by Pagerank)
Showing 26 of 26 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 4 of 4 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 293 | A Taxonomy of Time in Databases | 1985 | SIGMOD | 0.00028676087 |
| 676 | Archiving Scientific Data | 2002 | SIGMOD | 0.00018281665 |
| 1,281 | DataHub: Collaborative Data Science & Dataset Version Management at Scale | 2015 | CIDR | 0.00012854744 |
| 4,558 | Managing Structured Collections of Community Data | 2011 | CIDR | 6.0869516e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 676 | Archiving Scientific Data | 2002 | SIGMOD | 0.00018281665 |
| 1,923 | Reconciling while Tolerating Disagreement in Collaborative Data Sharing | 2006 | SIGMOD | 0.00010080761 |
| 9,908 | Keep Your Distributed Data Warehouse Consistent at a Minimal Cost | 2023 | SIGMOD | 4.2576943e-05 |
| 6,053 | Optimizing Machine Learning Workloads in Collaborative Environments | 2020 | SIGMOD | 5.2326838e-05 |
| 2,430 | Decibel: The Relational Dataset Branching System | 2016 | VLDB | 8.8330417e-05 |
| 5,280 | Explaining Dataset Changes for Semantic Data Versioning with Explain-Da-V | 2023 | VLDB | 5.5896735e-05 |
| 13,280 | Effective Data Versioning for Collaborative Data Analytics | 2020 | SIGMOD | - |
| 12,953 | Minimizing Time-Space Cost For Database Version Control (extended abstract) | 1988 | PODS | 4.1945683e-05 |
| 6,694 | Optimal Splitters for Temporal and Multi-version Databases | 2013 | SIGMOD | 4.9586454e-05 |
| 1,281 | DataHub: Collaborative Data Science & Dataset Version Management at Scale | 2015 | CIDR | 0.00012854744 |