Database Paper Browser

Back to papers

Capturing and Querying Fine-grained Provenance of Preprocessing Pipelines in Data Science

Summary: Fine-grained, element-level provenance for ML preprocessing pipelines. Formalizes a core set of preprocessing operators and provenance patterns; introduces an application-level Python library and evaluates scalability/overhead on real pipelines to enable debugging queries. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
12571
Venue
VLDB
Year
2021
Pagerank
4.5723431e-05
Overall Rank
8,163 | 43.22%
DOI
10.14778/3436905.3436911

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 4 of 4 citing papers.

Rank Citing Paper Year Venue Pagerank
9,118 Towards Observability for Production Machine Learning Pipelines 2022 VLDB 4.3928288e-05
9,231 Modyn: Data-Centric Machine Learning Pipeline Orchestration 2025 SIGMOD 4.3690661e-05
10,419 Unified Lineage System: Tracking Data Provenance at Scale 2025 SIGMOD 4.1945683e-05
11,396 DPDS: Assisting Data Science with Data Provenance 2022 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 13 of 13 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers