Unit Testing Data with Deequ
Summary: Deequ is a Spark-based library that automates data quality verification at scale with a declarative constraints API and custom validation. Open-source, production-ready at Amazon; scales to billions of records and supports incremental validation. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Sebastian Schelter
- 2. Felix Biessmann
- 3. Dustin Lange
- 4. Tammo Rukat
- 5. Philipp Schmidt
- 6. Stephan Seufert
- 7. Pierre Brunelle
- 8. Andrey Taptunov
Incoming Citations (Sorted by Pagerank)
Showing 4 of 4 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 7,838 | Auto-Validate: Unsupervised Data Validation Using Data-Domain Patterns Inferred from Data Lakes | 2021 | SIGMOD | 4.6377995e-05 |
| 8,092 | Saga: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications | 2023 | SIGMOD | 4.587921e-05 |
| 10,512 | Auto-Test: Learning Semantic-Domain Constraints for Unsupervised Error Detection in Tables | 2025 | SIGMOD | 4.1945683e-05 |
| 10,821 | Demonstrating Matelda for Multi-Table Error Detection | 2025 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 4 of 4 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 66 | Spark SQL: Relational Data Processing in Spark | 2015 | SIGMOD | 0.00061639801 |
| 1,420 | Data Management Challenges in Production Machine Learning | 2017 | SIGMOD | 0.00012057956 |
| 1,482 | Automating Large-Scale Data Quality Verification | 2018 | VLDB | 0.00011725533 |
| 5,257 | Probabilistic Demand Forecasting at Scale | 2017 | VLDB | 5.6003925e-05 |
Previous
Page 1 / 1
Next