Database Paper Browser

Back to papers

Titian: Data Provenance Support in Spark

Summary: Titian integrates data provenance into Apache Spark to trace data through transformations, enabling root-cause debugging in DISC workloads. Provenance at interactive speeds with modest overhead; typically under 30% of baseline, far faster than prior tools. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
11314
Venue
VLDB
Year
2016
Pagerank
9.7437067e-05
Overall Rank
2,027 | 85.91%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 21 of 21 citing papers.

Rank Citing Paper Year Venue Pagerank
2,152 MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis 2018 SIGMOD 9.4239787e-05
2,280 SMOKE: Fine-grained Lineage at Interactive Speed 2018 VLDB 9.1111033e-05
2,456 Production Machine Learning Pipelines: Empirical Analysis and Optimization Opportunities 2021 SIGMOD 8.7733773e-05
3,149 Fine-Grained, Secure and Efficient Data Provenance on Blockchain Systems 2019 VLDB 7.4741595e-05
4,774 LIMA: Fine-grained Lineage Tracing and Reuse in Machine Learning Systems 2021 SIGMOD 5.9316087e-05
5,086 Improving Reproducibility of Data Science Pipelines through Transparent Provenance Capture 2020 VLDB 5.7078462e-05
5,106 Debugging Big Data Analytics in Spark with BigDebug 2017 SIGMOD 5.6927181e-05
5,209 Explaining Outputs in Modern Data Analytics 2016 VLDB 5.629362e-05
6,981 Dataset Relationship Management 2019 CIDR 4.8743957e-05
7,720 Provenance: On and Behind the Screens 2016 SIGMOD 4.6684701e-05
7,833 Dependency-Driven Analytics: a Compass for Uncharted Data Oceans 2017 CIDR 4.6382648e-05
8,038 Amber: A Debuggable Dataflow System Based on the Actor Model 2020 VLDB 4.600124e-05
8,163 Capturing and Querying Fine-grained Provenance of Preprocessing Pipelines in Data Science 2021 VLDB 4.5723431e-05
8,394 Hypothetical Reasoning via Provenance Abstraction 2019 SIGMOD 4.527807e-05
10,024 LPStream: Fine-grained Lazy Provenance for Stream Processing 2026 SIGMOD 4.1945683e-05
10,419 Unified Lineage System: Tracking Data Provenance at Scale 2025 SIGMOD 4.1945683e-05
10,883 IcedTea: Efficient and Responsive Time-Travel Debugging in Dataflow Systems 2025 VLDB 4.1945683e-05
11,452 Flow Provenance in Temporal Interaction Networks 2021 SIGMOD 4.1945683e-05
11,647 Ariadne: Online Provenance for Big Graph Analytics 2019 SIGMOD 4.1945683e-05
11,710 Demonstration of Smoke: A Deep Breath of Data-Intensive Lineage Applications 2018 SIGMOD 4.1945683e-05
11,798 Privacy-Preserving Network Provenance 2017 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 11 of 11 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers