DPDS: Assisting Data Science with Data Provenance
Summary: DPDS provides provenance for Python/Pandas pipelines using an observer pattern to track changes to dataframe elements across transformations. A Neo4j graph with a UI supports querying to justify data operations from raw data to model training. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Authors
- 1. Adriane Chapman
- 2. Luca Lauro
- 3. Paolo Missier
- 4. Riccardo Torlone
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 2 of 2 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,427 | Towards Scalable Dataframe Systems | 2020 | VLDB | 0.0001204248 |
| 8,163 | Capturing and Querying Fine-grained Provenance of Preprocessing Pipelines in Data Science | 2021 | VLDB | 4.5723431e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 13,291 | Towards Understanding Data Analysis Workflows using a Large Notebook Corpus | 2019 | SIGMOD | - |
| 8,394 | Hypothetical Reasoning via Provenance Abstraction | 2019 | SIGMOD | 4.527807e-05 |
| 1,765 | Efficient Lineage Tracking For Scientific Workflows | 2008 | SIGMOD | 0.00010630348 |
| 11,743 | DfAnalyzer: Runtime Dataflow Analysis of Scientific Applications using Provenance | 2018 | VLDB | 4.1945683e-05 |
| 6,384 | A Demonstration of DBWipes: Clean as You Query | 2012 | VLDB | 5.0880333e-05 |
| 6,981 | Dataset Relationship Management | 2019 | CIDR | 4.8743957e-05 |
| 12,014 | A Provenance Framework for Data-Dependent Process Analysis | 2014 | VLDB | 4.1945683e-05 |
| 11,665 | Ursprung: Provenance for Large-Scale Analytics Environments | 2019 | SIGMOD | 4.1945683e-05 |
| 5,086 | Improving Reproducibility of Data Science Pipelines through Transparent Provenance Capture | 2020 | VLDB | 5.7078462e-05 |
| 8,163 | Capturing and Querying Fine-grained Provenance of Preprocessing Pipelines in Data Science | 2021 | VLDB | 4.5723431e-05 |