Database Paper Browser

Back to papers

Big Data Integration

Summary: BDI challenges: scale, dynamism, heterogeneity, and varied data quality across thousands of sources. Survey of progress in schema mapping, record linkage, and data fusion; identifies open problems for the data-management community. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
10566
Venue
VLDB
Year
2013
Pagerank
0.00024372588
Overall Rank
398 | 97.24%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 40 of 40 citing papers.

Rank Citing Paper Year Venue Pagerank
254 Snorkel: Rapid Training Data Creation with Weak Supervision 2018 VLDB 0.00030540555
939 Data Lake Management: Challenges and Opportunities 2019 VLDB 0.00015187344
1,215 Snuba: Automating Weak Supervision to Label Training Data 2019 VLDB 0.0001323375
1,612 Detecting Data Errors: Where are we and what needs to be done? 2016 VLDB 0.00011142794
2,184 A Sample-and-Clean Framework for Fast and Accurate Query Processing on Dirty Data 2014 SIGMOD 9.3429789e-05
2,209 Data Integration: After the Teenage Years 2017 PODS 9.2868035e-05
2,359 Data Market Platforms: Trading Data Assets to Solve Data Problems 2020 VLDB 8.9607667e-05
2,567 Resolving Conflicts in Heterogeneous Data by Truth Discovery and Source Reliability Estimation 2014 SIGMOD 8.5239306e-05
3,711 Saga: A Platform for Continuous Construction and Serving of Knowledge At Scale 2022 SIGMOD 6.823609e-05
3,897 SLiMFast: Guaranteed Results for Data Fusion and Source Reliability 2017 SIGMOD 6.6554845e-05
3,977 BLAST: a Loosely Schema-aware Meta-blocking Approach for Entity Resolution 2016 VLDB 6.5736268e-05
4,129 Are Key-Foreign Key Joins Safe to Avoid when Learning High-Capacity Classifiers? 2018 VLDB 6.428887e-05
4,607 Data Integration and Machine Learning: A Natural Synergy 2018 SIGMOD 6.0538827e-05
5,088 TCUDB: Accelerating Database with Tensor Processors 2022 SIGMOD 5.7072189e-05
5,251 Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale 2019 SIGMOD 5.6029615e-05
5,347 Adaptive Rule Discovery for Labeling Text Data 2021 SIGMOD 5.5560452e-05
5,958 Fine-grained Concept Linking using Neural Networks in Healthcare 2018 SIGMOD 5.2563968e-05
6,262 Fast Shapley Value Computation in Data Assemblage Tasks as Cooperative Simple Games 2024 SIGMOD 5.1349507e-05
6,412 CERES: Distantly Supervised Relation Extraction from the Semi-Structured Web 2018 VLDB 5.0740036e-05
7,029 Computational Fact Checking: A Content Management Perspective 2018 VLDB 4.8563777e-05
7,052 Pre-trained Embeddings for Entity Resolution: An Experimental Analysis 2023 VLDB 4.8497453e-05
7,243 Data Integration and Machine Learning: A Natural Synergy 2018 VLDB 4.7913666e-05
8,696 Effective Entity Augmentation By Querying External Data Sources 2023 VLDB 4.4660032e-05
8,722 Preference-aware Integration of Temporal Data 2015 VLDB 4.4606662e-05
8,751 Generations of Knowledge Graphs: The Crazy Ideas and the Business Impact 2023 VLDB 4.456315e-05
9,020 Entity Matching in the Wild: A Consistent and Versatile Framework to Unify Data in Industrial Applications 2020 SIGMOD 4.4079449e-05
9,132 A Time Machine for Information: Looking Back to Look Forward 2015 VLDB 4.3896196e-05
9,461 BrewER: Entity Resolution On-Demand 2023 VLDB 4.3366491e-05
9,683 Hierarchical Entity Resolution using an Oracle 2022 SIGMOD 4.3047774e-05
9,855 Progressive Entity Matching: A Design Space Exploration 2025 SIGMOD 4.269353e-05
9,977 A Vision for Autonomous Data Agent Collaboration: From Query-by-Integration to Query-by-Collaboration 2026 CIDR 4.1945683e-05
10,090 Integrating Vector Databases across Embedding Models 2026 SIGMOD 4.1945683e-05
10,617 Deduplicated Sampling On-Demand 2025 VLDB 4.1945683e-05
11,006 FusionQuery: On-demand Fusion Queries over Multi-source Heterogeneous Data 2024 VLDB 4.1945683e-05
11,373 Generalized Supervised Meta-blocking 2022 VLDB 4.1945683e-05
11,389 CDI-E: An Elastic Cloud Service for Data Engineering 2022 VLDB 4.1945683e-05
11,629 Leveraging Organizational Resources to Adapt Models to New Data Modalities 2020 VLDB 4.1945683e-05
11,706 Big Data Linkage for Product Specification Pages 2018 SIGMOD 4.1945683e-05
11,906 Knowledge Curation and Knowledge Fusion: Challenges, Models, and Applications 2015 SIGMOD 4.1945683e-05
13,383 FIT to Monitor Feed Quality 2015 VLDB -
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 0 of 0 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
Previous Page 1 / 1 Next

Semantically Similar Papers