Database Paper Browser

Back to papers

Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and Beyond

Summary: Rotom: meta-learned data augmentation for entity matching, data cleaning, and text classification. Introduces InvDA (seq2seq) and a learned policy to combine DA operators, reducing hyperparameter search and boosting low-resource results, beating SOTA. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
6149
Venue
SIGMOD
Year
2021
Pagerank
5.2453012e-05
Overall Rank
5,978 | 58.42%
DOI
10.1145/3448016.3457258

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 11 of 11 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 30 of 30 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
192 HoloClean: Holistic Data Repairs with Probabilistic Inference 2017 VLDB 0.00035728858
221 Deep Entity Matching with Pre-Trained Language Models 2021 VLDB 0.00033121824
254 Snorkel: Rapid Training Data Creation with Weak Supervision 2018 VLDB 0.00030540555
265 A Cost-Based Model and Effective Heuristic for Repairing Constraints by Value Modification 2005 SIGMOD 0.00029763412
300 Deep Learning for Entity Matching: A Design Space Exploration 2018 SIGMOD 0.00028441466
509 On Active Learning of Record Matching Packages 2010 SIGMOD 0.00021409518
610 Goods: Organizing Google's Datasets 2016 SIGMOD 0.00019232674
712 Magellan: Toward Building Entity Matching Management Systems 2016 VLDB 0.00017732426
791 ActiveClean: Interactive Data Cleaning For Statistical Modeling 2016 VLDB 0.00016629664
814 Entity Resolution: Theory, Practice & Open Challenges 2012 VLDB 0.00016370594
903 To Join or Not to Join? Thinking Twice about Joins before Feature Selection 2016 SIGMOD 0.0001547016
1,215 Snuba: Automating Weak Supervision to Label Training Data 2019 VLDB 0.0001323375
1,337 HoloDetect: Few-Shot Learning for Error Detection 2019 SIGMOD 0.00012497164
1,463 ARDA: Automatic Relational Data Augmentation for Machine Learning 2020 VLDB 0.00011869295
1,532 Data Management in Machine Learning: Challenges, Techniques, and Systems 2017 SIGMOD 0.00011472681
1,546 KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing 2015 SIGMOD 0.00011446851
1,894 Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning 2020 VLDB 0.0001018378
2,175 Falcon: Scaling Up Hands-Off Crowdsourced Entity Matching to Build Cloud Services 2017 SIGMOD 9.3644117e-05
2,767 A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching 2020 SIGMOD 8.1513883e-05
2,888 Sato: Contextual Semantic Type Detection in Tables 2020 VLDB 7.9594996e-05
2,968 Raha: A Configuration-Free Error Detection System 2019 SIGMOD 7.7985097e-05
3,067 CrowdFill: Collecting Structured Data from the Crowd 2014 SIGMOD 7.6180371e-05
3,773 Cleaning Crowdsourced Labels Using Oracles for Statistical Classification 2019 VLDB 6.7758649e-05
3,897 SLiMFast: Guaranteed Results for Data Fusion and Source Reliability 2017 SIGMOD 6.6554845e-05
4,129 Are Key-Foreign Key Joins Safe to Avoid when Learning High-Capacity Classifiers? 2018 VLDB 6.428887e-05
4,451 CLAMShell: Speeding up Crowds for Low-latency Data Labeling 2016 VLDB 6.1738675e-05
4,904 Temporal Rules Discovery for Web Data Cleaning 2016 VLDB 5.8399195e-05
6,042 MDedup: Duplicate Detection with Matching Dependencies 2020 VLDB 5.2405269e-05
7,013 Qualitative Data Cleaning 2016 VLDB 4.8619024e-05
7,117 Crowdsourced Data Management: Overview and Challenges 2017 SIGMOD 4.826509e-05
Previous Page 1 / 1 Next

Semantically Similar Papers