Improving Reproducibility of Data Science Pipelines through Transparent Provenance Capture
Summary: URSPRUNG integrates with execution environments to automatically capture static and runtime config, without code changes. It fuses system-level provenance with app-level signals (logs, stdout) via a DSL, achieving ~4% overhead. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Lukas Rupprecht
- 2. James C. Davis
- 3. Constantine Arnold
- 4. Yaniv Gur
- 5. Deepavali Bhagwat
Incoming Citations (Sorted by Pagerank)
Showing 7 of 7 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 2,456 | Production Machine Learning Pipelines: Empirical Analysis and Optimization Opportunities | 2021 | SIGMOD | 8.7733773e-05 |
| 8,624 | A Study of Database Performance Sensitivity to Experiment Settings | 2022 | VLDB | 4.483049e-05 |
| 8,729 | OneProvenance: Efficient Extraction of Dynamic Coarse-Grained Provenance From Database Query Event Logs | 2023 | VLDB | 4.4582221e-05 |
| 10,419 | Unified Lineage System: Tracking Data Provenance at Scale | 2025 | SIGMOD | 4.1945683e-05 |
| 10,842 | ML-Asset Management: Curation, Discovery, and Utilization | 2025 | VLDB | 4.1945683e-05 |
| 10,982 | On the Feasibility and Benefits of Extensive Evaluation | 2024 | SIGMOD | 4.1945683e-05 |
| 11,452 | Flow Provenance in Temporal Interaction Networks | 2021 | SIGMOD | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 9 of 9 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 610 | Goods: Organizing Google's Datasets | 2016 | SIGMOD | 0.00019232674 |
| 1,413 | VisTrails: Visualization meets Data Management | 2006 | SIGMOD | 0.00012121257 |
| 2,027 | Titian: Data Provenance Support in Spark | 2016 | VLDB | 9.7437067e-05 |
| 2,028 | Putting Lipstick on Pig: Enabling Database-style Workflow Provenance | 2012 | VLDB | 9.7433981e-05 |
| 2,430 | Decibel: The Relational Dataset Branching System | 2016 | VLDB | 8.8330417e-05 |
| 2,463 | noWorkflow: a Tool for Collecting, Analyzing, and Managing Provenance from Python Scripts | 2017 | VLDB | 8.7561396e-05 |
| 3,700 | RAMP: A System for Capturing and Tracing Provenance in MapReduce Workflows | 2011 | VLDB | 6.8307955e-05 |
| 3,875 | Cloudy with High Chance of DBMS: A 10-year Prediction for Enterprise-Grade ML | 2020 | CIDR | 6.675257e-05 |
| 7,833 | Dependency-Driven Analytics: a Compass for Uncharted Data Oceans | 2017 | CIDR | 4.6382648e-05 |
Previous
Page 1 / 1
Next