Unlocking the Power of CI/CD for Data Pipelines in Distributed Data Warehouses
Summary: Production-config-driven CI enabling scalable, isolated in‑production tests in large distributed warehouses, cutting overhead and achieving 94.5% pre-prod issue detection at YouTube. Lineage-aware algebraic impact analysis auto-propagates data-quality checks across pipelines to ensure consistency. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Authors
- 1. Hongtao Yang
- 2. Zhichen Xu
- 3. Sergey Yudin
- 4. Andrew Davidson
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 3 of 3 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,943 | Procella: Unifying serving and analytical data at YouTube | 2019 | VLDB | 0.00010012569 |
| 3,491 | TensorFlow Data Validation: Data Analysis and Validation in Continuous ML Pipelines | 2020 | SIGMOD | 7.0451276e-05 |
| 9,908 | Keep Your Distributed Data Warehouse Consistent at a Minimal Cost | 2023 | SIGMOD | 4.2576943e-05 |
Previous
Page 1 / 1
Next