Database Paper Browser

Back to papers

Finding Related Tables in Data Lakes for Interactive Data Science

Summary: Data-lake search integrated with Jupyter Notebook for interactive data science. Find joinable or linkable tables, schemas, and workflows; augment training data, extract features, and clean data, with core methods that generalize to program or script executions. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
5941
Venue
SIGMOD
Year
2020
Pagerank
0.00011041787
Overall Rank
1,644 | 88.57%
DOI
10.1145/3318464.3389726

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 27 of 27 citing papers.

Rank Citing Paper Year Venue Pagerank
2,836 Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning 2023 VLDB 8.0443826e-05
3,000 SANTOS: Relationship-based Semantic Table Union Search 2023 SIGMOD 7.7462128e-05
3,335 DeepJoin: Joinable Table Discovery with Pre-trained Language Models 2023 VLDB 7.2065006e-05
3,824 Correlation Sketches for Approximate Join-Correlation Queries 2021 SIGMOD 6.7260705e-05
4,859 Integrating Data Lake Tables 2023 VLDB 5.8732433e-05
6,270 MATE: Multi-Attribute Table Extraction 2022 VLDB 5.1337451e-05
6,438 RONIN: Data Lake Exploration 2021 VLDB 5.0620163e-05
6,449 Causal Data Integration 2023 VLDB 5.0587746e-05
7,868 Solo: Data Discovery Using Natural Language Questions Via A Self-Supervised Approach 2023 SIGMOD 4.6319504e-05
8,193 WarpGate: A Semantic Join Discovery System for Cloud Data Warehouses 2023 CIDR 4.5618596e-05
8,503 A Demonstration of KGLac: A Data Discovery and Enrichment Platform for Data Science 2021 VLDB 4.496339e-05
8,910 R2D2: Reducing Redundancy and Duplication in Data Lakes 2023 SIGMOD 4.427232e-05
8,917 Data Lakes Empowered by Knowledge Graph Technologies 2021 SIGMOD 4.427232e-05
9,928 Fainder: A Fast and Accurate Index for Distribution-Aware Dataset Search 2024 VLDB 4.2511622e-05
10,341 A Theoretical Framework for Distribution-Aware Dataset Search 2025 PODS 4.1945683e-05
10,364 A Rank-Based Approach to Recommender System’s Top-K Queries with Uncertain Scores 2025 SIGMOD 4.1945683e-05
10,510 Table Overlap Estimation through Graph Embeddings 2025 SIGMOD 4.1945683e-05
10,540 Discovering Approximate Inclusion Dependencies 2025 VLDB 4.1945683e-05
10,685 LakeVisage: Towards Scalable, Flexible and Interactive Visualization Recommendation for Data Discovery over Data Lakes 2025 VLDB 4.1945683e-05
10,754 OmniMatch: Joinability Discovery in Data Products 2025 VLDB 4.1945683e-05
10,820 APEX-DAG: Library and Language independent Pipeline EXtraction 2025 VLDB 4.1945683e-05
10,836 Data Discovery in Data Lakes: Operations, Indexes, Systems 2025 VLDB 4.1945683e-05
10,951 Determining the Largest Overlap between Tables 2024 SIGMOD 4.1945683e-05
11,054 Enriching Relations with Additional Attributes for ER 2024 VLDB 4.1945683e-05
11,063 Searching Data Lakes for Nested and Joined Data 2024 VLDB 4.1945683e-05
11,379 Fast Dataset Search with Earth Mover’s Distance 2022 VLDB 4.1945683e-05
11,420 Detecting Layout Templates in Complex Multiregion Files 2022 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 21 of 21 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
7 Optimal Aggregation Algorithms for Middleware [Extended Abstract] 2001 PODS 0.0015496097
107 WebTables: Exploring the Power of Tables on the Web 2008 VLDB 0.00048377684
224 CORDS: Automatic Discovery of Correlations and Soft Functional Dependencies 2004 SIGMOD 0.00032746205
610 Goods: Organizing Google's Datasets 2016 SIGMOD 0.00019232674
674 Supporting Top-k Join Queries in Relational Databases 2003 VLDB 0.00018327585
903 To Join or Not to Join? Thinking Twice about Joins before Feature Selection 2016 SIGMOD 0.0001547016
939 Data Lake Management: Challenges and Opportunities 2019 VLDB 0.00015187344
951 Comparing Stars: On Approximating Graph Edit Distance 2009 VLDB 0.00015106325
1,001 Recovering Semantics of Tables on the Web 2011 VLDB 0.00014706505
1,178 Table Union Search on Open Data 2018 VLDB 0.00013468118
1,187 JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes 2019 SIGMOD 0.00013443639
1,262 RankSQL: Query Algebra and Optimization for Relational Top-k Queries 2005 SIGMOD 0.00012986539
1,277 The Data Civilizer System 2017 CIDR 0.00012879695
1,367 Answering Table Queries on the Web using Column Keywords 2012 VLDB 0.00012349783
2,141 LSH Ensemble: Internet-Scale Domain Search 2016 VLDB 9.4542625e-05
3,110 Learning to Create Data-Integrating Queries 2008 VLDB 7.5475982e-05
3,155 Ten Years of WebTables 2018 VLDB 7.4672742e-05
3,281 Constance: An Intelligent Data Lake System 2016 SIGMOD 7.2823287e-05
4,464 Magellan: Toward Building Entity Matching Management Systems over Data Science Stacks 2016 VLDB 6.1606042e-05
4,595 Juneau: Data Lake Management for Jupyter 2019 VLDB 6.060188e-05
6,355 User Feedback as a First Class Citizen in Information Integration Systems 2011 CIDR 5.0987661e-05
Previous Page 1 / 1 Next

Semantically Similar Papers