Database Paper Browser

Back to papers

HoloDetect: Few-Shot Learning for Error Detection

Summary: HoloDetect: few-shot error detection with a two-part model; rich representations and a data-augmentation policy learner. Augmenting a small seed of clean data yields ~94% precision, ~93% recall, ~20 F1 gains, and ~3x fewer labels than ML baselines. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
5700
Venue
SIGMOD
Year
2019
Pagerank
0.00012497164
Overall Rank
1,337 | 90.71%
DOI
10.1145/3299869.3319888

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 43 of 43 citing papers.

Rank Citing Paper Year Venue Pagerank
333 Neo: A Learned Query Optimizer 2019 VLDB 0.00027206884
517 Can Foundation Models Wrangle Your Data? 2023 VLDB 0.00021169035
1,894 Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning 2020 VLDB 0.0001018378
2,122 SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle 2020 CIDR 9.4989076e-05
2,753 Complaint-driven Training Data Debugging for Query 2.0 2020 SIGMOD 8.1724339e-05
3,252 Auto-Suggest: Learning-to-Recommend Data Preparation Steps Using Data Science Notebooks 2020 SIGMOD 7.3178277e-05
3,299 SCODED: Statistical Constraint Oriented Data Error Detection 2020 SIGMOD 7.2546659e-05
3,396 Automatic Data Repair: Are We Ready to Deploy? 2024 VLDB 7.1455126e-05
3,440 Approximate Denial Constraints 2020 VLDB 7.0918817e-05
3,711 Saga: A Platform for Continuous Construction and Serving of Knowledge At Scale 2022 SIGMOD 6.823609e-05
4,127 A Statistical Perspective on Discovering Functional Dependencies in Noisy Data 2020 SIGMOD 6.4310458e-05
4,424 PrIU: A Provenance-Based Approach for Incrementally Updating Regression Models 2020 SIGMOD 6.198474e-05
5,096 Auto-Transform: Learning-to-Transform by Patterns 2020 VLDB 5.7011825e-05
5,222 Enabling SQL-based Training Data Debugging for Federated Learning 2022 VLDB 5.6210545e-05
5,978 Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and Beyond 2021 SIGMOD 5.2453012e-05
6,077 The Fast and the Private: Task-based Dataset Search 2024 CIDR 5.2229324e-05
6,134 Finding Label and Model Errors in Perception Data With Learned Observation Assertions 2022 SIGMOD 5.1943414e-05
6,187 Semi-Supervised Data Cleaning with Raha and Baran 2021 CIDR 5.1656857e-05
6,690 Parallel Discrepancy Detection and Incremental Detection 2021 VLDB 4.9621556e-05
7,704 ExDRa: Exploratory Data Science on Federated Raw Data 2021 SIGMOD 4.6733838e-05
7,838 Auto-Validate: Unsupervised Data Validation Using Data-Domain Patterns Inferred from Data Lakes 2021 SIGMOD 4.6377995e-05
8,092 Saga: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications 2023 SIGMOD 4.587921e-05
8,257 Automating and Optimizing Data-Centric What-If Analyses on Native Machine Learning Pipelines 2023 SIGMOD 4.5487511e-05
8,590 Exploratory Training: When Annotators Learn About Data 2023 SIGMOD 4.4896282e-05
9,077 VerifAI: Verified Generative AI 2024 CIDR 4.4010762e-05
9,348 GIDCL: A Graph-Enhanced Interpretable Data Cleaning Framework with Large Language Models 2024 SIGMOD 4.3526427e-05
9,355 Discovering Top-k Rules using Subjective and Objective Criteria 2023 SIGMOD 4.3514328e-05
9,389 DataVinci: Learning Syntactic and Semantic String Repairs 2025 SIGMOD 4.3441378e-05
9,434 Rock: Cleaning Data by Embedding ML in Logic Rules 2024 SIGMOD 4.3430376e-05
9,487 Making It Tractable to Catch Duplicates and Conflicts in Graphs 2023 SIGMOD 4.3341665e-05
9,560 MTSClean: Efficient Constraint-based Cleaning for Multi-Dimensional Time Series Data 2024 VLDB 4.3254416e-05
9,777 Data Augmentation for ML-driven Data Preparation and Integration 2021 VLDB 4.2856106e-05
9,849 Reptile: Aggregation-level Explanations for Hierarchical Data 2022 SIGMOD 4.2721228e-05
9,963 Parallel Rule Discovery from Large Datasets by Sampling 2022 SIGMOD 4.2294678e-05
9,984 Towards Scalable Visual Data Wrangling via Direct Manipulation 2026 CIDR 4.1945683e-05
10,026 Minimum Change ≠ Best Cleaning: Parallel and Incremental Error Detection under Integrity Constraints 2026 SIGMOD 4.1945683e-05
10,598 Auto-Prep: Holistic Prediction of Data Preparation Steps for Self-Service Business Intelligence 2025 VLDB 4.1945683e-05
10,712 DobLIX: A Dual-Objective Learned Index for Log-Structured Merge Trees 2025 VLDB 4.1945683e-05
10,723 UniClean: A Scalable Data Cleaning Solution for Mixed Errors based on Unified Cleaners and Optimized Cleaning Workflow 2025 VLDB 4.1945683e-05
10,821 Demonstrating Matelda for Multi-Table Error Detection 2025 VLDB 4.1945683e-05
11,000 MisDetect: Iterative Mislabel Detection using Early Loss 2024 VLDB 4.1945683e-05
11,137 Generalizable Data Cleaning of Tabular Data in Latent Space 2024 VLDB 4.1945683e-05
11,369 PGE: Robust Product Graph Embedding Learning for Error Detection 2022 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 17 of 17 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers