TensorFlow Data Validation: Data Analysis and Validation in Continuous ML Pipelines
Summary: TFDV offers scalable data analysis and validation for continuous ML pipelines, elevating data quality as a first-class concern. Integrated with TensorFlow Extended (TFX), it provides production-grade data monitoring, schema validation, and anomaly detection; open-sourced and widely adopted. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Emily Caveness
- 2. Paul Suganthan G. C.
- 3. Zhuo Peng
- 4. Neoklis Polyzotis
- 5. Sudip Roy
- 6. Martin Zinkevich
Incoming Citations (Sorted by Pagerank)
Showing 6 of 6 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 8,092 | Saga: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications | 2023 | SIGMOD | 4.587921e-05 |
| 8,915 | DQDF: Data-Quality-Aware Dataframes | 2022 | VLDB | 4.427232e-05 |
| 10,771 | Unlocking the Power of CI/CD for Data Pipelines in Distributed Data Warehouses | 2025 | VLDB | 4.1945683e-05 |
| 10,867 | T-Assess: An Efficient Data Quality Assessment System Tailored for Trajectory Data | 2025 | VLDB | 4.1945683e-05 |
| 11,280 | CM-Explorer: Dissecting Data Ingestion Problems | 2023 | VLDB | 4.1945683e-05 |
| 11,342 | FILA: Online Auditing of Machine Learning Model Accuracy under Finite Labelling Budget | 2022 | SIGMOD | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 2 of 2 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,482 | Automating Large-Scale Data Quality Verification | 2018 | VLDB | 0.00011725533 |
| 1,627 | Data Cleaning: Overview and Emerging Challenges | 2016 | SIGMOD | 0.00011086905 |
Previous
Page 1 / 1
Next