Database Paper Browser

Back to papers

Data Integration and Machine Learning: A Natural Synergy

Summary: Tutorial on data integration–machine learning synergy, highlighting ML-driven automation and human-in-the-loop pipelines to cut costs and boost accuracy. It surveys ML-based data integration, the need for clean, integrated data for end-to-end ML, and outlines cross-domain challenges. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
5586
Venue
SIGMOD
Year
2018
Pagerank
6.0538827e-05
Overall Rank
4,607 | 67.96%
DOI
10.1145/3183713.3197387

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 10 of 10 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 25 of 25 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
192 HoloClean: Holistic Data Repairs with Probabilistic Inference 2017 VLDB 0.00035728858
199 Declarative Data Cleaning: Language, Model, and Algorithms 2001 VLDB 0.00035041015
254 Snorkel: Rapid Training Data Creation with Weak Supervision 2018 VLDB 0.00030540555
319 Evaluation of entity resolution approaches on real-world match problems 2010 VLDB 0.00027781866
398 Big Data Integration 2013 VLDB 0.00024372588
489 Data Curation at Scale: The Data Tamer System 2013 CIDR 0.00022030728
610 Goods: Organizing Google's Datasets 2016 SIGMOD 0.00019232674
643 Corleone: Hands-Off Crowdsourcing for Entity Matching 2014 SIGMOD 0.00018754451
791 ActiveClean: Interactive Data Cleaning For Statistical Modeling 2016 VLDB 0.00016629664
814 Entity Resolution: Theory, Practice & Open Challenges 2012 VLDB 0.00016370594
936 Framework for Evaluating Clustering Algorithms in Duplicate Detection 2009 VLDB 0.0001521549
1,211 Truth Finding on the Deep Web: Is the Problem Solved? 2013 VLDB 0.00013257101
1,367 Answering Table Queries on the Web using Column Keywords 2012 VLDB 0.00012349783
1,420 Data Management Challenges in Production Machine Learning 2017 SIGMOD 0.00012057956
1,532 Data Management in Machine Learning: Challenges, Techniques, and Systems 2017 SIGMOD 0.00011472681
1,627 Data Cleaning: Overview and Emerging Challenges 2016 SIGMOD 0.00011086905
2,126 MacroBase: Prioritizing Attention in Fast Data 2017 SIGMOD 9.4887794e-05
2,175 Falcon: Scaling Up Hands-Off Crowdsourced Entity Matching to Build Cloud Services 2017 SIGMOD 9.3644117e-05
2,420 From Data Fusion to Knowledge Fusion 2014 VLDB 8.8530994e-05
2,452 Data Fusion – Resolving Data Conflicts for Integration 2009 VLDB 8.7839322e-05
3,105 Data X-Ray: A Diagnostic Tool for Data Errors 2015 SIGMOD 7.5568954e-05
3,303 Fonduer: Knowledge Base Construction from Richly Formatted Data 2018 SIGMOD 7.2487486e-05
3,495 Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources 2015 VLDB 7.0400666e-05
3,897 SLiMFast: Guaranteed Results for Data Fusion and Source Reliability 2017 SIGMOD 6.6554845e-05
4,126 Waldo: An Adaptive Human Interface for Crowd Entity Resolution 2017 SIGMOD 6.4314729e-05
Previous Page 1 / 1 Next

Semantically Similar Papers