Database Paper Browser

Back to papers

Open Data Integration

Summary: Shifts data integration from traditional query discovery to scalable data-discovery for open data, addressing unknown schemas and content at internet scale. Outlines a research agenda and progress toward scalable, query-aware discovery algorithms with high recall and accuracy across massive data repositories. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
11725
Venue
VLDB
Year
2018
Pagerank
8.2126735e-05
Overall Rank
2,730 | 81.01%
DOI
10.14778/3229863.3240491

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 22 of 22 citing papers.

Rank Citing Paper Year Venue Pagerank
939 Data Lake Management: Challenges and Opportunities 2019 VLDB 0.00015187344
2,517 Annotating Columns with Pre-trained Language Models 2022 SIGMOD 8.6092139e-05
2,836 Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning 2023 VLDB 8.0443826e-05
3,000 SANTOS: Relationship-based Semantic Table Union Search 2023 SIGMOD 7.7462128e-05
3,400 ELPIS: Graph-Based Similarity Search for Scalable Data Science 2023 VLDB 7.1405533e-05
3,963 Pytheas: Pattern-based Table Discovery in CSV Files 2020 VLDB 6.5840643e-05
4,859 Integrating Data Lake Tables 2023 VLDB 5.8732433e-05
5,280 Explaining Dataset Changes for Semantic Data Versioning with Explain-Da-V 2023 VLDB 5.5896735e-05
6,233 Mosaic: A Sample-Based Database System for Open World Query Processing 2020 CIDR 5.1451876e-05
6,262 Fast Shapley Value Computation in Data Assemblage Tasks as Cooperative Simple Games 2024 SIGMOD 5.1349507e-05
6,270 MATE: Multi-Attribute Table Extraction 2022 VLDB 5.1337451e-05
6,360 High-Dimensional Vector Similarity Search: From Time Series to Deep Network Embeddings 2020 SIGMOD 5.0961051e-05
6,475 Explain3D: Explaining Disagreements in Disjoint Datasets 2019 VLDB 5.0497183e-05
7,048 Magneto: Combining Small and Large Language Models for Schema Matching 2025 VLDB 4.8520651e-05
7,602 Causal Feature Selection for Algorithmic Fairness 2022 SIGMOD 4.6988081e-05
8,579 RECA: Related Tables Enhanced Column Semantic Type Annotation Framework 2023 VLDB 4.4922446e-05
9,399 TabulaX: Leveraging Large Language Models for Multi-Class Table Transformations 2025 VLDB 4.3441378e-05
9,773 EquiTensors: Learning Fair Integrations of Heterogeneous Urban Data 2021 SIGMOD 4.2856106e-05
9,977 A Vision for Autonomous Data Agent Collaboration: From Query-by-Integration to Query-by-Collaboration 2026 CIDR 4.1945683e-05
10,090 Integrating Vector Databases across Embedding Models 2026 SIGMOD 4.1945683e-05
10,711 Cracking Vector Search Indexes 2025 VLDB 4.1945683e-05
11,389 CDI-E: An Elastic Cloud Service for Data Engineering 2022 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 31 of 31 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
107 WebTables: Exploring the Power of Tables on the Web 2008 VLDB 0.00048377684
127 Querying Heterogeneous Information Sources Using Source Descriptions 1996 VLDB 0.00044642203
151 Optimizing Queries across Diverse Data Sources 1997 VLDB 0.00041016476
173 Schema Mapping as Query Discovery 2000 VLDB 0.00038627829
302 Relative Information Capacity of Simple Relational Database Schemata 1984 PODS 0.00028316973
420 InfoGather: Entity Augmentation and Attribute Discovery By Holistic Matching with Web Tables 2012 SIGMOD 0.00023719065
518 Data Integration for the Relational Web 2009 VLDB 0.00021158934
610 Goods: Organizing Google's Datasets 2016 SIGMOD 0.00019232674
818 Finding Related Tables 2012 SIGMOD 0.00016311524
902 Statistical Schema Matching across Web Query Interfaces 2003 SIGMOD 0.00015486247
916 On Schema Matching with Opaque Column Names and Data Values 2003 SIGMOD 0.00015379422
1,178 Table Union Search on Open Data 2018 VLDB 0.00013468118
1,277 The Data Civilizer System 2017 CIDR 0.00012879695
1,301 The Use of Information Capacity in Schema Integration and Translation 1993 VLDB 0.00012706678
1,367 Answering Table Queries on the Web using Column Keywords 2012 VLDB 0.00012349783
1,396 Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search 2012 SIGMOD 0.00012204748
1,511 Using Schematically Heterogeneous Structures 1998 SIGMOD 0.00011602872
1,883 The iBench Integration Metadata Generator 2016 VLDB 0.00010215862
1,950 Leveraging Data and Structure in Ontology Integration 2007 SIGMOD 9.9756731e-05
2,141 LSH Ensemble: Internet-Scale Domain Search 2016 VLDB 9.4542625e-05
3,459 An Empirical Evaluation of Set Similarity Join Techniques 2016 VLDB 7.072508e-05
3,735 Auto-Join: Joining Tables by Leveraging Transformations 2017 VLDB 6.8061318e-05
3,769 A Data Transformation System for Biological Data Sources 1995 VLDB 6.7782158e-05
3,797 Stitching Web Tables for Improving Matching Quality 2017 VLDB 6.7597149e-05
3,830 ++Spicy: an Open-Source Tool for Second-Generation Schema Mapping and Data Exchange 2011 VLDB 6.7193951e-05
3,992 Discovering Linkage Points over Web Data 2013 VLDB 6.5544834e-05
5,536 On Indexing Error-Tolerant Set Containment 2010 SIGMOD 5.4532734e-05
5,571 HAMSTER: Using Search Clicklogs for Schema and Taxonomy Matching 2009 VLDB 5.4283499e-05
5,789 Interactive Navigation of Open Data Linkages 2017 VLDB 5.3269741e-05
7,668 Human-in-the-loop Data Integration 2017 VLDB 4.6834075e-05
9,182 Leveraging Query Logs for Schema Mapping Generation in U-MAP 2011 SIGMOD 4.3806885e-05
Previous Page 1 / 1 Next

Semantically Similar Papers