Database Paper Browser

Back to papers

Unit Testing Data with Deequ

Summary: Deequ is a Spark-based library that automates data quality verification at scale with a declarative constraints API and custom validation. Open-source, production-ready at Amazon; scales to billions of records and supports incremental validation. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
5718
Venue
SIGMOD
Year
2019
Pagerank
4.8693227e-05
Overall Rank
6,993 | 51.36%
DOI
10.1145/3299869.3320210

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 4 of 4 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 4 of 4 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
66 Spark SQL: Relational Data Processing in Spark 2015 SIGMOD 0.00061639801
1,420 Data Management Challenges in Production Machine Learning 2017 SIGMOD 0.00012057956
1,482 Automating Large-Scale Data Quality Verification 2018 VLDB 0.00011725533
5,257 Probabilistic Demand Forecasting at Scale 2017 VLDB 5.6003925e-05
Previous Page 1 / 1 Next

Semantically Similar Papers