| 192 |
HoloClean: Holistic Data Repairs with Probabilistic Inference |
2017 |
VLDB |
0.00035728858 |
| 221 |
Deep Entity Matching with Pre-Trained Language Models |
2021 |
VLDB |
0.00033121824 |
| 254 |
Snorkel: Rapid Training Data Creation with Weak Supervision |
2018 |
VLDB |
0.00030540555 |
| 265 |
A Cost-Based Model and Effective Heuristic for Repairing Constraints by Value Modification |
2005 |
SIGMOD |
0.00029763412 |
| 300 |
Deep Learning for Entity Matching: A Design Space Exploration |
2018 |
SIGMOD |
0.00028441466 |
| 509 |
On Active Learning of Record Matching Packages |
2010 |
SIGMOD |
0.00021409518 |
| 610 |
Goods: Organizing Google's Datasets |
2016 |
SIGMOD |
0.00019232674 |
| 712 |
Magellan: Toward Building Entity Matching Management Systems |
2016 |
VLDB |
0.00017732426 |
| 791 |
ActiveClean: Interactive Data Cleaning For Statistical Modeling |
2016 |
VLDB |
0.00016629664 |
| 814 |
Entity Resolution: Theory, Practice & Open Challenges |
2012 |
VLDB |
0.00016370594 |
| 903 |
To Join or Not to Join? Thinking Twice about Joins before Feature Selection |
2016 |
SIGMOD |
0.0001547016 |
| 1,215 |
Snuba: Automating Weak Supervision to Label Training Data |
2019 |
VLDB |
0.0001323375 |
| 1,337 |
HoloDetect: Few-Shot Learning for Error Detection |
2019 |
SIGMOD |
0.00012497164 |
| 1,463 |
ARDA: Automatic Relational Data Augmentation for Machine Learning |
2020 |
VLDB |
0.00011869295 |
| 1,532 |
Data Management in Machine Learning: Challenges, Techniques, and Systems |
2017 |
SIGMOD |
0.00011472681 |
| 1,546 |
KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing |
2015 |
SIGMOD |
0.00011446851 |
| 1,894 |
Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning |
2020 |
VLDB |
0.0001018378 |
| 2,175 |
Falcon: Scaling Up Hands-Off Crowdsourced Entity Matching to Build Cloud Services |
2017 |
SIGMOD |
9.3644117e-05 |
| 2,767 |
A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching |
2020 |
SIGMOD |
8.1513883e-05 |
| 2,888 |
Sato: Contextual Semantic Type Detection in Tables |
2020 |
VLDB |
7.9594996e-05 |
| 2,968 |
Raha: A Configuration-Free Error Detection System |
2019 |
SIGMOD |
7.7985097e-05 |
| 3,067 |
CrowdFill: Collecting Structured Data from the Crowd |
2014 |
SIGMOD |
7.6180371e-05 |
| 3,773 |
Cleaning Crowdsourced Labels Using Oracles for Statistical Classification |
2019 |
VLDB |
6.7758649e-05 |
| 3,897 |
SLiMFast: Guaranteed Results for Data Fusion and Source Reliability |
2017 |
SIGMOD |
6.6554845e-05 |
| 4,129 |
Are Key-Foreign Key Joins Safe to Avoid when Learning High-Capacity Classifiers? |
2018 |
VLDB |
6.428887e-05 |
| 4,451 |
CLAMShell: Speeding up Crowds for Low-latency Data Labeling |
2016 |
VLDB |
6.1738675e-05 |
| 4,904 |
Temporal Rules Discovery for Web Data Cleaning |
2016 |
VLDB |
5.8399195e-05 |
| 6,042 |
MDedup: Duplicate Detection with Matching Dependencies |
2020 |
VLDB |
5.2405269e-05 |
| 7,013 |
Qualitative Data Cleaning |
2016 |
VLDB |
4.8619024e-05 |
| 7,117 |
Crowdsourced Data Management: Overview and Challenges |
2017 |
SIGMOD |
4.826509e-05 |