Database Paper Browser

Back to papers

Data Augmentation for ML-driven Data Preparation and Integration

Summary: Tutorial on DA for ML-driven data preparation and integration in data management. Covers task-specific operators, interpolation, conditional generation, and policy learning; links to active learning and weak supervision. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
12523
Venue
VLDB
Year
2021
Pagerank
4.2856106e-05
Overall Rank
9,777 | 31.99%
DOI
10.14778/3476311.3476403

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 2 of 2 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 23 of 23 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
208 Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach 2001 SIGMOD 0.0003460594
221 Deep Entity Matching with Pre-Trained Language Models 2021 VLDB 0.00033121824
254 Snorkel: Rapid Training Data Creation with Weak Supervision 2018 VLDB 0.00030540555
300 Deep Learning for Entity Matching: A Design Space Exploration 2018 SIGMOD 0.00028441466
513 TURL: Table Understanding through Representation Learning 2021 VLDB 0.00021288342
1,215 Snuba: Automating Weak Supervision to Label Training Data 2019 VLDB 0.0001323375
1,267 Foofah: Transforming Data By Example 2017 SIGMOD 0.00012936483
1,337 HoloDetect: Few-Shot Learning for Error Detection 2019 SIGMOD 0.00012497164
1,533 Example-driven Design of Efficient Record Matching Queries 2007 VLDB 0.00011471971
1,546 KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing 2015 SIGMOD 0.00011446851
1,894 Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning 2020 VLDB 0.0001018378
1,914 Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks 2020 SIGMOD 0.00010109102
2,097 Predictive Interaction for Data Transformation 2015 CIDR 9.5489822e-05
2,349 RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation 2021 VLDB 8.9876423e-05
2,421 Data Synthesis based on Generative Adversarial Networks 2018 VLDB 8.8514021e-05
2,767 A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching 2020 SIGMOD 8.1513883e-05
2,968 Raha: A Configuration-Free Error Detection System 2019 SIGMOD 7.7985097e-05
4,607 Data Integration and Machine Learning: A Natural Synergy 2018 SIGMOD 6.0538827e-05
4,884 Relational Data Synthesis using Generative Adversarial Networks: A Design Space Exploration 2020 VLDB 5.8540287e-05
5,978 Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and Beyond 2021 SIGMOD 5.2453012e-05
6,526 Data Collection and Quality Challenges for Deep Learning 2020 VLDB 5.0267429e-05
7,613 ADnEV: Cross-Domain Schema Matching using Deep Similarity Matrix Adjustment and Evaluation 2020 VLDB 4.6961059e-05
8,042 Transform-Data-by-Example (TDE): Extensible Data Transformation in Excel 2018 SIGMOD 4.5994569e-05
Previous Page 1 / 1 Next

Semantically Similar Papers