Database Paper Browser

Back to papers

Integrating Data Lake Tables

Summary: ALITE: first scalable system to compute Full Disjunction for integrating tables discovered from data lakes (via join/union/related-table search). Relaxes assumptions of identical attribute names, completeness (no nulls) and acyclic joins, empirically outperforms prior algorithms and supplies three real-data benchmarks. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID
13344
Venue
VLDB
Year
2023
Pagerank
5.8732433e-05
Overall Rank
4,859 | 66.20%
DOI
10.14778/3574245.3574274

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 15 of 15 citing papers.

Rank Citing Paper Year Venue Pagerank
2,836 Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning 2023 VLDB 8.0443826e-05
3,335 DeepJoin: Joinable Table Discovery with Pre-trained Language Models 2023 VLDB 7.2065006e-05
5,280 Explaining Dataset Changes for Semantic Data Versioning with Explain-Da-V 2023 VLDB 5.5896735e-05
6,262 Fast Shapley Value Computation in Data Assemblage Tasks as Cooperative Simple Games 2024 SIGMOD 5.1349507e-05
9,399 TabulaX: Leveraging Large Language Models for Multi-Class Table Transformations 2025 VLDB 4.3441378e-05
9,961 QueryArtisan: Generating Data Manipulation Codes for Ad-hoc Analysis in Data Lakes 2025 VLDB 4.2294678e-05
10,109 Retrieve-and-Verify: A Table Context Selection Framework for Accurate Column Annotations 2026 SIGMOD 4.1945683e-05
10,197 Qualitative Join Discovery in Data Lakes using Examples 2026 SIGMOD 4.1945683e-05
10,510 Table Overlap Estimation through Graph Embeddings 2025 SIGMOD 4.1945683e-05
10,540 Discovering Approximate Inclusion Dependencies 2025 VLDB 4.1945683e-05
10,725 Suna: Scalable Causal Confounder Discovery over Relational Data 2025 VLDB 4.1945683e-05
10,753 Cents: A Flexible and Cost-Effective Framework for LLM-Based Table Understanding 2025 VLDB 4.1945683e-05
10,754 OmniMatch: Joinability Discovery in Data Products 2025 VLDB 4.1945683e-05
10,951 Determining the Largest Overlap between Tables 2024 SIGMOD 4.1945683e-05
11,063 Searching Data Lakes for Nested and Joined Data 2024 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 30 of 30 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
8 Extending the Data Base Relational Model to Capture More Meaning 1979 SIGMOD 0.0015385917
303 Generic Schema Matching with Cupid 2001 VLDB 0.00028301477
319 Evaluation of entity resolution approaches on real-world match problems 2010 VLDB 0.00027781866
364 Annotating and Searching Web Tables Using Entities, Types and Relationships 2010 VLDB 0.00025637562
382 COMA - A system for flexible combination of schema matching approaches 2002 VLDB 0.00024823252
513 TURL: Table Understanding through Representation Learning 2021 VLDB 0.00021288342
518 Data Integration for the Relational Web 2009 VLDB 0.00021158934
818 Finding Related Tables 2012 SIGMOD 0.00016311524
916 On Schema Matching with Opaque Column Names and Data Values 2003 SIGMOD 0.00015379422
939 Data Lake Management: Challenges and Opportunities 2019 VLDB 0.00015187344
1,032 Outerjoins as Disjunctions 1994 SIGMOD 0.00014544529
1,178 Table Union Search on Open Data 2018 VLDB 0.00013468118
1,187 JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes 2019 SIGMOD 0.00013443639
1,644 Finding Related Tables in Data Lakes for Interactive Data Science 2020 SIGMOD 0.00011041787
1,914 Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks 2020 SIGMOD 0.00010109102
2,517 Annotating Columns with Pre-trained Language Models 2022 SIGMOD 8.6092139e-05
2,730 Open Data Integration 2018 VLDB 8.2126735e-05
2,836 Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning 2023 VLDB 8.0443826e-05
2,966 Integrating Information by Outerjoins and Full Disjunctions (Extended Abstract) 1996 PODS 7.8002072e-05
3,000 SANTOS: Relationship-based Semantic Table Union Search 2023 SIGMOD 7.7462128e-05
3,735 Auto-Join: Joining Tables by Leveraging Transformations 2017 VLDB 6.8061318e-05
3,797 Stitching Web Tables for Improving Matching Quality 2017 VLDB 6.7597149e-05
3,823 Automatic Discovery of Attributes in Relational Databases 2011 SIGMOD 6.7261168e-05
4,801 CLAMS: Bringing Quality to Data Lakes 2016 SIGMOD 5.9115269e-05
5,141 Full Disjunctions: Polynomial-Delay Iterators in Action 2006 VLDB 5.6673499e-05
5,789 Interactive Navigation of Open Data Linkages 2017 VLDB 5.3269741e-05
6,438 RONIN: Data Lake Exploration 2021 VLDB 5.0620163e-05
7,613 ADnEV: Cross-Domain Schema Matching using Deep Similarity Matrix Adjustment and Evaluation 2020 VLDB 4.6961059e-05
8,072 An Incremental Algorithm for Computing Ranked Full Disjunctions 2005 PODS 4.5922874e-05
9,511 Computing Full Disjunctions 2003 PODS 4.3340927e-05
Previous Page 1 / 1 Next

Semantically Similar Papers