Certain and Approximately Certain Models for Statistical Learning

Summary: Unified framework for deciding when imputation is unnecessary to train accurate statistical models on incomplete data. Efficient, theory-backed algorithms certify certain/approximately certain learning across common ML paradigms, often avoiding costly imputation with little overhead. (summarized by gpt-5.4-mini on May 24 2026)

Paper ID: 6892
Venue: SIGMOD
Year: 2024
Pagerank: 4.1905499e-05
Overall Rank: 10,956 | 23.86%
DOI: 10.1145/3654929

Incoming Non-self Citations Over Time

No non-self incoming citations found for this paper in this database.

Authors

Incoming Citations (Sorted by Pagerank)

Showing 0 of 0 citing papers.

Rank	Citing Paper	Year	Venue	Pagerank

Outgoing Citations (Sorted by Pagerank)

Showing 5 of 5 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank	Cited Paper	Year	Venue	Pagerank
788	ActiveClean: Interactive Data Cleaning For Statistical Modeling	2016	VLDB	0.00016618698
2,308	Nearest Neighbor Classifiers over Incomplete Information: From Certain Answers to Certain Predictions	2021	VLDB	9.0634287e-05
4,103	GoodCore: Data-effective and Data-efficient Machine Learning through Coreset Selection over Incomplete Data	2023	SIGMOD	6.4460899e-05
5,026	Adaptive Data Augmentation for Supervised Learning over Missing Data	2021	VLDB	5.7451454e-05
7,868	Learning Over Dirty Data Without Cleaning	2020	SIGMOD	4.6276013e-05

Semantically Similar Papers

Overall Rank	Paper	Year	Venue	Pagerank
10,652	Still More Shades of Null: An Evaluation Suite for Responsible Missing Value Imputation	2025	VLDB	4.1905499e-05
2,575	Query Optimization for Dynamic Imputation	2017	VLDB	8.5100213e-05
4,103	GoodCore: Data-effective and Data-efficient Machine Learning through Coreset Selection over Incomplete Data	2023	SIGMOD	6.4460899e-05
9,855	In-Database Data Imputation	2024	SIGMOD	4.2652623e-05
11,053	Win-Win: On Simultaneous Clustering and Imputing over Incomplete Data	2024	VLDB	4.1905499e-05
2,308	Nearest Neighbor Classifiers over Incomplete Information: From Certain Answers to Certain Predictions	2021	VLDB	9.0634287e-05
8,139	Fast and Reliable Missing Data Contingency Analysis with Predicate-Constraints	2020	SIGMOD	4.5727142e-05
4,331	Missing Value Imputation on Multidimensional Time Series	2021	VLDB	6.2744869e-05
3,313	Efficient and Effective Data Imputation with Influence Functions	2022	VLDB	7.2336734e-05
6,601	Missing Data Imputation with Uncertainty-Driven Network	2024	SIGMOD	4.9924633e-05