Back to papers
Data Integration: After the Teenage Years
Summary: Surveys DI's evolution from view-based rewriting to AI-enabled extraction, uncertainty handling, and learning across structured and unstructured sources. Advocates two priorities: mature open-source integration tooling (BigGorilla) and practical, systematic methods to fuse structured and unstructured data.
(summarized by gpt-5-mini on Feb 09 2026)
- Paper ID
- 1736
- Venue
- PODS
- Year
- 2017
- Pagerank
- 9.2868035e-05
- Overall Rank
- 2,209 | 84.64%
- DOI
-
10.1145/3034786.3056124
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 11 of 11 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 1,914 |
Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks |
2020 |
SIGMOD |
0.00010109102 |
| 2,038 |
The return of JedAI: End-to-End Entity Resolution for Structured and Semi-Structured Data |
2018 |
VLDB |
9.7098952e-05 |
| 2,289 |
Veritas: Shared Verifiable Databases and Tables in the Cloud |
2019 |
CIDR |
9.0946871e-05 |
| 2,349 |
RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation |
2021 |
VLDB |
8.9876423e-05 |
| 5,529 |
Data-Driven Domain Discovery for Structured Datasets |
2020 |
VLDB |
5.4566641e-05 |
| 6,263 |
Equitable Data Valuation Meets the Right to Be Forgotten in Model Markets |
2023 |
VLDB |
5.1349507e-05 |
| 7,634 |
ReStore - Neural Data Completion for Relational Databases |
2021 |
SIGMOD |
4.6911382e-05 |
| 7,795 |
ForBackBench: A Benchmark for Chasing vs. Query-Rewriting |
2022 |
VLDB |
4.6482625e-05 |
| 8,696 |
Effective Entity Augmentation By Querying External Data Sources |
2023 |
VLDB |
4.4660032e-05 |
| 9,028 |
Enabling Rich Queries Over Heterogeneous Data From Diverse Sources In HealthCare |
2020 |
CIDR |
4.4043898e-05 |
| 10,090 |
Integrating Vector Databases across Embedding Models |
2026 |
SIGMOD |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 20 of 20 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 48 |
Data Integration: A Theoretical Perspective |
2002 |
PODS |
0.00069720859 |
| 138 |
Query Transformation for PSJ-queries |
1987 |
VLDB |
0.00042334092 |
| 188 |
Applying Model Management to Classical Meta Data Problems |
2003 |
CIDR |
0.00035968389 |
| 398 |
Big Data Integration |
2013 |
VLDB |
0.00024372588 |
| 416 |
Computing Queries from Derived Relations |
1985 |
VLDB |
0.0002380776 |
| 489 |
Data Curation at Scale: The Data Tamer System |
2013 |
CIDR |
0.00022030728 |
| 578 |
The GMAP: A Versatile Tool for Physical Data Independence |
1994 |
VLDB |
0.00019838707 |
| 621 |
Schema Mappings, Data Exchange, and Metadata Management |
2005 |
PODS |
0.00019005115 |
| 712 |
Magellan: Toward Building Entity Matching Management Systems |
2016 |
VLDB |
0.00017732426 |
| 731 |
Optimizing Queries Using Materialized Views: A Practical, Scalable Solution |
2001 |
SIGMOD |
0.00017468889 |
| 809 |
Curated Databases |
2008 |
PODS |
0.00016430384 |
| 893 |
Data Integration: The Teenage Years |
2006 |
VLDB |
0.00015558352 |
| 1,277 |
The Data Civilizer System |
2017 |
CIDR |
0.00012879695 |
| 1,537 |
Google's Deep-Web Crawl |
2008 |
VLDB |
0.00011465704 |
| 1,883 |
The iBench Integration Metadata Generator |
2016 |
VLDB |
0.00010215862 |
| 2,634 |
STBenchmark: Towards a Benchmark for Mapping Systems |
2008 |
VLDB |
8.4048633e-05 |
| 3,288 |
Biperpedia: An Ontology for Search Applications |
2014 |
VLDB |
7.273034e-05 |
| 4,486 |
OpenII: An Open Source Information Integration Toolkit |
2010 |
SIGMOD |
6.1455674e-05 |
| 7,549 |
SOLOMON: Seeking the Truth Via Copying Detection |
2010 |
VLDB |
4.7137426e-05 |
| 8,135 |
Applying WebTables in Practice |
2015 |
CIDR |
4.5777549e-05 |
Semantically Similar Papers