| 192 |
HoloClean: Holistic Data Repairs with Probabilistic Inference |
2017 |
VLDB |
0.00035728858 |
| 517 |
Can Foundation Models Wrangle Your Data? |
2023 |
VLDB |
0.00021169035 |
| 1,337 |
HoloDetect: Few-Shot Learning for Error Detection |
2019 |
SIGMOD |
0.00012497164 |
| 1,612 |
Detecting Data Errors: Where are we and what needs to be done? |
2016 |
VLDB |
0.00011142794 |
| 1,627 |
Data Cleaning: Overview and Emerging Challenges |
2016 |
SIGMOD |
0.00011086905 |
| 1,894 |
Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning |
2020 |
VLDB |
0.0001018378 |
| 2,158 |
Uni-Detect: A Unified Approach to Automated Error Detection in Tables |
2019 |
SIGMOD |
9.4141354e-05 |
| 2,302 |
Nearest Neighbor Classifiers over Incomplete Information: From Certain Answers to Certain Predictions |
2021 |
VLDB |
9.0668832e-05 |
| 2,349 |
RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation |
2021 |
VLDB |
8.9876423e-05 |
| 2,506 |
Auto-Detect: Data-Driven Error Detection in Tables |
2018 |
SIGMOD |
8.6335464e-05 |
| 2,968 |
Raha: A Configuration-Free Error Detection System |
2019 |
SIGMOD |
7.7985097e-05 |
| 3,299 |
SCODED: Statistical Constraint Oriented Data Error Detection |
2020 |
SIGMOD |
7.2546659e-05 |
| 3,396 |
Automatic Data Repair: Are We Ready to Deploy? |
2024 |
VLDB |
7.1455126e-05 |
| 3,773 |
Cleaning Crowdsourced Labels Using Oracles for Statistical Classification |
2019 |
VLDB |
6.7758649e-05 |
| 4,126 |
Waldo: An Adaptive Human Interface for Crowd Entity Resolution |
2017 |
SIGMOD |
6.4314729e-05 |
| 4,806 |
Uncertainty Annotated Databases - A Lightweight Approach for Approximating Certain Answers |
2019 |
SIGMOD |
5.9092698e-05 |
| 5,096 |
Auto-Transform: Learning-to-Transform by Patterns |
2020 |
VLDB |
5.7011825e-05 |
| 5,729 |
KATARA: Reliable Data Cleaning with Knowledge Bases and Crowdsourcing |
2015 |
VLDB |
5.3506368e-05 |
| 5,978 |
Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and Beyond |
2021 |
SIGMOD |
5.2453012e-05 |
| 6,182 |
Top-K Deep Video Analytics: A Probabilistic Approach |
2021 |
SIGMOD |
5.1682689e-05 |
| 6,187 |
Semi-Supervised Data Cleaning with Raha and Baran |
2021 |
CIDR |
5.1656857e-05 |
| 6,416 |
Synthesizing Type-Detection Logic for Rich Semantic Data Types using Open-source Code |
2018 |
SIGMOD |
5.072267e-05 |
| 7,013 |
Qualitative Data Cleaning |
2016 |
VLDB |
4.8619024e-05 |
| 7,223 |
Akane: Perplexity-Guided Time Series Data Cleaning |
2024 |
SIGMOD |
4.7965857e-05 |
| 7,292 |
Subjective Knowledge Base Construction Powered By Crowdsourcing and Knowledge Base |
2018 |
SIGMOD |
4.7740174e-05 |
| 7,766 |
ICARUS: Minimizing Human Effort in Iterative Data Completion |
2018 |
VLDB |
4.6564959e-05 |
| 9,043 |
Query-Guided Resolution in Uncertain Databases |
2023 |
SIGMOD |
4.4039656e-05 |
| 9,221 |
VisClean: Interactive Cleaning for Progressive Visualization |
2020 |
VLDB |
4.3699444e-05 |
| 9,240 |
ZIP: Lazy Imputation during Query Processing |
2024 |
VLDB |
4.3690661e-05 |
| 9,348 |
GIDCL: A Graph-Enhanced Interpretable Data Cleaning Framework with Large Language Models |
2024 |
SIGMOD |
4.3526427e-05 |
| 9,479 |
Data Imputation with Limited Data Redundancy Using Data Lakes |
2025 |
VLDB |
4.3341665e-05 |
| 9,771 |
EasyDR: A Human-in-the-loop Error Detection and Repair Platform for Holistic Table Cleaning |
2022 |
VLDB |
4.2856106e-05 |
| 9,777 |
Data Augmentation for ML-driven Data Preparation and Integration |
2021 |
VLDB |
4.2856106e-05 |
| 9,896 |
Towards Interpretable and Learnable Risk Analysis for Entity Resolution |
2020 |
SIGMOD |
4.2600049e-05 |
| 10,003 |
Clustering with Set Outliers and Applications in Relational Clustering |
2026 |
PODS |
4.1945683e-05 |
| 10,026 |
Minimum Change ≠ Best Cleaning: Parallel and Incremental Error Detection under Integrity Constraints |
2026 |
SIGMOD |
4.1945683e-05 |
| 11,006 |
FusionQuery: On-demand Fusion Queries over Multi-source Heterogeneous Data |
2024 |
VLDB |
4.1945683e-05 |
| 11,069 |
Hardware-Efficient Data Imputation through DBMS Extensibility |
2024 |
VLDB |
4.1945683e-05 |
| 11,178 |
LinCQA: Faster Consistent Query Answering with Linear Time Guarantees |
2023 |
SIGMOD |
4.1945683e-05 |
| 11,399 |
ActivePDB: Active Probabilistic Databases |
2022 |
VLDB |
4.1945683e-05 |
| 11,454 |
Contextual Data Cleaning with Ontology FDs |
2021 |
SIGMOD |
4.1945683e-05 |
| 11,536 |
LOCATER: Cleaning WiFi Connectivity Datasets for Semantic Localization |
2021 |
VLDB |
4.1945683e-05 |
| 11,680 |
WiClean: A System for Fixing Wikipedia Interlinks Using Revision History Patterns |
2019 |
VLDB |
4.1945683e-05 |
| 11,788 |
CDB: Optimizing Queries with Crowd-Based Selections and Joins |
2017 |
SIGMOD |
4.1945683e-05 |
| 11,816 |
DOCS: Domain-Aware Crowdsourcing System |
2017 |
VLDB |
4.1945683e-05 |