Back to papers
Titian: Data Provenance Support in Spark
Summary: Titian integrates data provenance into Apache Spark to trace data through transformations, enabling root-cause debugging in DISC workloads. Provenance at interactive speeds with modest overhead; typically under 30% of baseline, far faster than prior tools.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 11314
- Venue
- VLDB
- Year
- 2016
- Pagerank
- 9.7437067e-05
- Overall Rank
- 2,027 | 85.91%
- DOI
-
-
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 21 of 21 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 2,152 |
MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis |
2018 |
SIGMOD |
9.4239787e-05 |
| 2,280 |
SMOKE: Fine-grained Lineage at Interactive Speed |
2018 |
VLDB |
9.1111033e-05 |
| 2,456 |
Production Machine Learning Pipelines: Empirical Analysis and Optimization Opportunities |
2021 |
SIGMOD |
8.7733773e-05 |
| 3,149 |
Fine-Grained, Secure and Efficient Data Provenance on Blockchain Systems |
2019 |
VLDB |
7.4741595e-05 |
| 4,774 |
LIMA: Fine-grained Lineage Tracing and Reuse in Machine Learning Systems |
2021 |
SIGMOD |
5.9316087e-05 |
| 5,086 |
Improving Reproducibility of Data Science Pipelines through Transparent Provenance Capture |
2020 |
VLDB |
5.7078462e-05 |
| 5,106 |
Debugging Big Data Analytics in Spark with BigDebug |
2017 |
SIGMOD |
5.6927181e-05 |
| 5,209 |
Explaining Outputs in Modern Data Analytics |
2016 |
VLDB |
5.629362e-05 |
| 6,981 |
Dataset Relationship Management |
2019 |
CIDR |
4.8743957e-05 |
| 7,720 |
Provenance: On and Behind the Screens |
2016 |
SIGMOD |
4.6684701e-05 |
| 7,833 |
Dependency-Driven Analytics: a Compass for Uncharted Data Oceans |
2017 |
CIDR |
4.6382648e-05 |
| 8,038 |
Amber: A Debuggable Dataflow System Based on the Actor Model |
2020 |
VLDB |
4.600124e-05 |
| 8,163 |
Capturing and Querying Fine-grained Provenance of Preprocessing Pipelines in Data Science |
2021 |
VLDB |
4.5723431e-05 |
| 8,394 |
Hypothetical Reasoning via Provenance Abstraction |
2019 |
SIGMOD |
4.527807e-05 |
| 10,024 |
LPStream: Fine-grained Lazy Provenance for Stream Processing |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,419 |
Unified Lineage System: Tracking Data Provenance at Scale |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,883 |
IcedTea: Efficient and Responsive Time-Travel Debugging in Dataflow Systems |
2025 |
VLDB |
4.1945683e-05 |
| 11,452 |
Flow Provenance in Temporal Interaction Networks |
2021 |
SIGMOD |
4.1945683e-05 |
| 11,647 |
Ariadne: Online Provenance for Big Graph Analytics |
2019 |
SIGMOD |
4.1945683e-05 |
| 11,710 |
Demonstration of Smoke: A Deep Breath of Data-Intensive Lineage Applications |
2018 |
SIGMOD |
4.1945683e-05 |
| 11,798 |
Privacy-Preserving Network Provenance |
2017 |
VLDB |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 11 of 11 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 11,405 |
SparkCAD: Caching Anomalies Detector for Spark Applications |
2022 |
VLDB |
4.1945683e-05 |
| 11,743 |
DfAnalyzer: Runtime Dataflow Analysis of Scientific Applications using Provenance |
2018 |
VLDB |
4.1945683e-05 |
| 9,704 |
Debugging Missing Answers for Spark Queries over Nested Data with Breadcrumb |
2021 |
VLDB |
4.3005882e-05 |
| 11,396 |
DPDS: Assisting Data Science with Data Provenance |
2022 |
VLDB |
4.1945683e-05 |
| 5,086 |
Improving Reproducibility of Data Science Pipelines through Transparent Provenance Capture |
2020 |
VLDB |
5.7078462e-05 |
| 8,163 |
Capturing and Querying Fine-grained Provenance of Preprocessing Pipelines in Data Science |
2021 |
VLDB |
4.5723431e-05 |
| 11,665 |
Ursprung: Provenance for Large-Scale Analytics Environments |
2019 |
SIGMOD |
4.1945683e-05 |
| 11,647 |
Ariadne: Online Provenance for Big Graph Analytics |
2019 |
SIGMOD |
4.1945683e-05 |
| 5,106 |
Debugging Big Data Analytics in Spark with BigDebug |
2017 |
SIGMOD |
5.6927181e-05 |
| 11,662 |
Capturing and Querying Structural Provenance in Spark with Pebble |
2019 |
SIGMOD |
4.1945683e-05 |