Database Paper Browser

Back to papers

Determining the Largest Overlap between Tables

Summary: Formalizes the largest-table-overlap problem and presents Sloth, an efficient solver for the largest common subtable under arbitrary row/column permutations. Real-world evaluation highlights its utility for version discovery, data cleaning, and deduplication across data lakes and web tables. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
6857
Venue
SIGMOD
Year
2024
Pagerank
4.1945683e-05
Overall Rank
10,951 | 23.82%
DOI
10.1145/3639303

Incoming Non-self Citations Over Time

No non-self incoming citations found for this paper in this database.

Authors

Incoming Citations (Sorted by Pagerank)

Showing 1 of 1 citing papers.

Rank Citing Paper Year Venue Pagerank
10,510 Table Overlap Estimation through Graph Embeddings 2025 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 24 of 24 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
107 WebTables: Exploring the Power of Tables on the Web 2008 VLDB 0.00048377684
364 Annotating and Searching Web Tables Using Entities, Types and Relationships 2010 VLDB 0.00025637562
513 TURL: Table Understanding through Representation Learning 2021 VLDB 0.00021288342
518 Data Integration for the Relational Web 2009 VLDB 0.00021158934
746 Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores 2020 VLDB 0.00017326979
818 Finding Related Tables 2012 SIGMOD 0.00016311524
939 Data Lake Management: Challenges and Opportunities 2019 VLDB 0.00015187344
1,178 Table Union Search on Open Data 2018 VLDB 0.00013468118
1,187 JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes 2019 SIGMOD 0.00013443639
1,211 Truth Finding on the Deep Web: Is the Problem Solved? 2013 VLDB 0.00013257101
1,644 Finding Related Tables in Data Lakes for Interactive Data Science 2020 SIGMOD 0.00011041787
1,914 Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks 2020 SIGMOD 0.00010109102
2,141 LSH Ensemble: Internet-Scale Domain Search 2016 VLDB 9.4542625e-05
3,358 Organizing Data Lakes for Navigation 2020 SIGMOD 7.1784949e-05
3,520 GitTables: A Large-Scale Corpus of Relational Tables 2023 SIGMOD 7.0131061e-05
3,797 Stitching Web Tables for Improving Matching Quality 2017 VLDB 6.7597149e-05
4,353 Overlap Set Similarity Joins with Theoretical Guarantees 2018 SIGMOD 6.263585e-05
4,784 Divide & Conquer-based Inclusion Dependency Discovery 2015 VLDB 5.9240851e-05
4,859 Integrating Data Lake Tables 2023 VLDB 5.8732433e-05
5,179 SilkMoth: An Efficient Method for Finding Related Sets with Maximum Matching Constraints 2017 VLDB 5.6428428e-05
5,449 Transformers for Tabular Data Representation: A Tutorial on Models and Applications 2022 VLDB 5.5008652e-05
5,506 Exploring Change – A New Dimension of Data Analytics 2019 VLDB 5.473324e-05
6,270 MATE: Multi-Attribute Table Extraction 2022 VLDB 5.1337451e-05
8,949 Discovering Similarity Inclusion Dependencies 2023 SIGMOD 4.4234478e-05
Previous Page 1 / 1 Next

Semantically Similar Papers