Back to papers
Nearest Neighbor Classifiers over Incomplete Information: From Certain Answers to Certain Predictions
Summary: Extends Certain Answers to ML via CP; CP existence (Q1) and CP counts (Q2) studied, focusing on NNs enabling CP queries. CPClean uses CP to clean ML data; on systematic missingness, it closes the gap by cleaning ~36% of data, beating BoostClean.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 12429
- Venue
- VLDB
- Year
- 2021
- Pagerank
- 9.0668832e-05
- Overall Rank
- 2,302 | 83.99%
- DOI
-
10.14778/3430915.3430917
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 18 of 18 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 4,102 |
GoodCore: Data-effective and Data-efficient Machine Learning through Coreset Selection over Incomplete Data |
2023 |
SIGMOD |
6.4522929e-05 |
| 5,429 |
DiffPrep: Differentiable Data Preprocessing Pipeline Search for Learning over Tabular Data |
2023 |
SIGMOD |
5.5087325e-05 |
| 7,704 |
ExDRa: Exploratory Data Science on Federated Raw Data |
2021 |
SIGMOD |
4.6733838e-05 |
| 8,092 |
Saga: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications |
2023 |
SIGMOD |
4.587921e-05 |
| 8,743 |
CtxPipe: Context-aware Data Preparation Pipeline Construction for Machine Learning |
2024 |
SIGMOD |
4.456315e-05 |
| 9,348 |
GIDCL: A Graph-Enhanced Interpretable Data Cleaning Framework with Large Language Models |
2024 |
SIGMOD |
4.3526427e-05 |
| 9,856 |
In-Database Data Imputation |
2024 |
SIGMOD |
4.269353e-05 |
| 10,463 |
Zorro: Quantifying Uncertainty in Models & Predictions Arising from Dirty Data |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,523 |
Scalable Complex Event Processing on Video Streams |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,528 |
Two Birds with One Stone: Efficient Deep Learning over Mislabeled Data through Subset Selection |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,628 |
CatDB: Data-catalog-guided, LLM-based Generation of Data-centric ML Pipelines |
2025 |
VLDB |
4.1945683e-05 |
| 10,644 |
Still More Shades of Null: An Evaluation Suite for Responsible Missing Value Imputation |
2025 |
VLDB |
4.1945683e-05 |
| 10,953 |
Certain and Approximately Certain Models for Statistical Learning |
2024 |
SIGMOD |
4.1945683e-05 |
| 11,050 |
Win-Win: On Simultaneous Clustering and Imputing over Incomplete Data |
2024 |
VLDB |
4.1945683e-05 |
| 11,052 |
Efficiently Mitigating the Impact of Data Drift on Machine Learning Pipelines |
2024 |
VLDB |
4.1945683e-05 |
| 11,137 |
Generalizable Data Cleaning of Tabular Data in Latent Space |
2024 |
VLDB |
4.1945683e-05 |
| 11,178 |
LinCQA: Faster Consistent Query Answering with Linear Time Guarantees |
2023 |
SIGMOD |
4.1945683e-05 |
| 11,431 |
Ease.ML: A Lifecycle Management System for MLDev and MLOps |
2021 |
CIDR |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 7 of 7 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 6,946 |
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data |
2011 |
VLDB |
4.8909775e-05 |
| 4,724 |
Nearest-Neighbor Searching Under Uncertainty |
2012 |
PODS |
5.9697823e-05 |
| 4,536 |
Data Series Progressive Similarity Search with Probabilistic Quality Guarantees |
2020 |
SIGMOD |
6.104642e-05 |
| 1,542 |
Efficient Search for the Top-k Probable Nearest Neighbors in Uncertain Databases |
2008 |
VLDB |
0.00011456321 |
| 9,351 |
On Efficient Approximate Queries over Machine Learning Models |
2023 |
VLDB |
4.3524472e-05 |
| 11,595 |
Minimization of Classifier Construction Cost for Search Queries |
2020 |
SIGMOD |
4.1945683e-05 |
| 4,102 |
GoodCore: Data-effective and Data-efficient Machine Learning through Coreset Selection over Incomplete Data |
2023 |
SIGMOD |
6.4522929e-05 |
| 9,761 |
Explaining k-Nearest Neighbors: Abductive and Counterfactual Explanations |
2025 |
PODS |
4.2856106e-05 |
| 5,253 |
Enriching Data Imputation with Extensive Similarity Neighbors |
2015 |
VLDB |
5.6014916e-05 |
| 10,953 |
Certain and Approximately Certain Models for Statistical Learning |
2024 |
SIGMOD |
4.1945683e-05 |