Database Paper Browser

Back to papers

Nearest Neighbor Classifiers over Incomplete Information: From Certain Answers to Certain Predictions

Summary: Extends Certain Answers to ML via CP; CP existence (Q1) and CP counts (Q2) studied, focusing on NNs enabling CP queries. CPClean uses CP to clean ML data; on systematic missingness, it closes the gap by cleaning ~36% of data, beating BoostClean. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
12429
Venue
VLDB
Year
2021
Pagerank
9.0668832e-05
Overall Rank
2,302 | 83.99%
DOI
10.14778/3430915.3430917

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 18 of 18 citing papers.

Rank Citing Paper Year Venue Pagerank
4,102 GoodCore: Data-effective and Data-efficient Machine Learning through Coreset Selection over Incomplete Data 2023 SIGMOD 6.4522929e-05
5,429 DiffPrep: Differentiable Data Preprocessing Pipeline Search for Learning over Tabular Data 2023 SIGMOD 5.5087325e-05
7,704 ExDRa: Exploratory Data Science on Federated Raw Data 2021 SIGMOD 4.6733838e-05
8,092 Saga: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications 2023 SIGMOD 4.587921e-05
8,743 CtxPipe: Context-aware Data Preparation Pipeline Construction for Machine Learning 2024 SIGMOD 4.456315e-05
9,348 GIDCL: A Graph-Enhanced Interpretable Data Cleaning Framework with Large Language Models 2024 SIGMOD 4.3526427e-05
9,856 In-Database Data Imputation 2024 SIGMOD 4.269353e-05
10,463 Zorro: Quantifying Uncertainty in Models & Predictions Arising from Dirty Data 2025 SIGMOD 4.1945683e-05
10,523 Scalable Complex Event Processing on Video Streams 2025 SIGMOD 4.1945683e-05
10,528 Two Birds with One Stone: Efficient Deep Learning over Mislabeled Data through Subset Selection 2025 SIGMOD 4.1945683e-05
10,628 CatDB: Data-catalog-guided, LLM-based Generation of Data-centric ML Pipelines 2025 VLDB 4.1945683e-05
10,644 Still More Shades of Null: An Evaluation Suite for Responsible Missing Value Imputation 2025 VLDB 4.1945683e-05
10,953 Certain and Approximately Certain Models for Statistical Learning 2024 SIGMOD 4.1945683e-05
11,050 Win-Win: On Simultaneous Clustering and Imputing over Incomplete Data 2024 VLDB 4.1945683e-05
11,052 Efficiently Mitigating the Impact of Data Drift on Machine Learning Pipelines 2024 VLDB 4.1945683e-05
11,137 Generalizable Data Cleaning of Tabular Data in Latent Space 2024 VLDB 4.1945683e-05
11,178 LinCQA: Faster Consistent Query Answering with Linear Time Guarantees 2023 SIGMOD 4.1945683e-05
11,431 Ease.ML: A Lifecycle Management System for MLDev and MLOps 2021 CIDR 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 7 of 7 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers