Back to papers
Finding Related Tables in Data Lakes for Interactive Data Science
Summary: Data-lake search integrated with Jupyter Notebook for interactive data science. Find joinable or linkable tables, schemas, and workflows; augment training data, extract features, and clean data, with core methods that generalize to program or script executions.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 5941
- Venue
- SIGMOD
- Year
- 2020
- Pagerank
- 0.00011041787
- Overall Rank
- 1,644 | 88.57%
- DOI
-
10.1145/3318464.3389726
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 27 of 27 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 2,836 |
Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning |
2023 |
VLDB |
8.0443826e-05 |
| 3,000 |
SANTOS: Relationship-based Semantic Table Union Search |
2023 |
SIGMOD |
7.7462128e-05 |
| 3,335 |
DeepJoin: Joinable Table Discovery with Pre-trained Language Models |
2023 |
VLDB |
7.2065006e-05 |
| 3,824 |
Correlation Sketches for Approximate Join-Correlation Queries |
2021 |
SIGMOD |
6.7260705e-05 |
| 4,859 |
Integrating Data Lake Tables |
2023 |
VLDB |
5.8732433e-05 |
| 6,270 |
MATE: Multi-Attribute Table Extraction |
2022 |
VLDB |
5.1337451e-05 |
| 6,438 |
RONIN: Data Lake Exploration |
2021 |
VLDB |
5.0620163e-05 |
| 6,449 |
Causal Data Integration |
2023 |
VLDB |
5.0587746e-05 |
| 7,868 |
Solo: Data Discovery Using Natural Language Questions Via A Self-Supervised Approach |
2023 |
SIGMOD |
4.6319504e-05 |
| 8,193 |
WarpGate: A Semantic Join Discovery System for Cloud Data Warehouses |
2023 |
CIDR |
4.5618596e-05 |
| 8,503 |
A Demonstration of KGLac: A Data Discovery and Enrichment Platform for Data Science |
2021 |
VLDB |
4.496339e-05 |
| 8,910 |
R2D2: Reducing Redundancy and Duplication in Data Lakes |
2023 |
SIGMOD |
4.427232e-05 |
| 8,917 |
Data Lakes Empowered by Knowledge Graph Technologies |
2021 |
SIGMOD |
4.427232e-05 |
| 9,928 |
Fainder: A Fast and Accurate Index for Distribution-Aware Dataset Search |
2024 |
VLDB |
4.2511622e-05 |
| 10,341 |
A Theoretical Framework for Distribution-Aware Dataset Search |
2025 |
PODS |
4.1945683e-05 |
| 10,364 |
A Rank-Based Approach to Recommender System’s Top-K Queries with Uncertain Scores |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,510 |
Table Overlap Estimation through Graph Embeddings |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,540 |
Discovering Approximate Inclusion Dependencies |
2025 |
VLDB |
4.1945683e-05 |
| 10,685 |
LakeVisage: Towards Scalable, Flexible and Interactive Visualization Recommendation for Data Discovery over Data Lakes |
2025 |
VLDB |
4.1945683e-05 |
| 10,754 |
OmniMatch: Joinability Discovery in Data Products |
2025 |
VLDB |
4.1945683e-05 |
| 10,820 |
APEX-DAG: Library and Language independent Pipeline EXtraction |
2025 |
VLDB |
4.1945683e-05 |
| 10,836 |
Data Discovery in Data Lakes: Operations, Indexes, Systems |
2025 |
VLDB |
4.1945683e-05 |
| 10,951 |
Determining the Largest Overlap between Tables |
2024 |
SIGMOD |
4.1945683e-05 |
| 11,054 |
Enriching Relations with Additional Attributes for ER |
2024 |
VLDB |
4.1945683e-05 |
| 11,063 |
Searching Data Lakes for Nested and Joined Data |
2024 |
VLDB |
4.1945683e-05 |
| 11,379 |
Fast Dataset Search with Earth Mover’s Distance |
2022 |
VLDB |
4.1945683e-05 |
| 11,420 |
Detecting Layout Templates in Complex Multiregion Files |
2022 |
VLDB |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 21 of 21 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 7 |
Optimal Aggregation Algorithms for Middleware [Extended Abstract] |
2001 |
PODS |
0.0015496097 |
| 107 |
WebTables: Exploring the Power of Tables on the Web |
2008 |
VLDB |
0.00048377684 |
| 224 |
CORDS: Automatic Discovery of Correlations and Soft Functional Dependencies |
2004 |
SIGMOD |
0.00032746205 |
| 610 |
Goods: Organizing Google's Datasets |
2016 |
SIGMOD |
0.00019232674 |
| 674 |
Supporting Top-k Join Queries in Relational Databases |
2003 |
VLDB |
0.00018327585 |
| 903 |
To Join or Not to Join? Thinking Twice about Joins before Feature Selection |
2016 |
SIGMOD |
0.0001547016 |
| 939 |
Data Lake Management: Challenges and Opportunities |
2019 |
VLDB |
0.00015187344 |
| 951 |
Comparing Stars: On Approximating Graph Edit Distance |
2009 |
VLDB |
0.00015106325 |
| 1,001 |
Recovering Semantics of Tables on the Web |
2011 |
VLDB |
0.00014706505 |
| 1,178 |
Table Union Search on Open Data |
2018 |
VLDB |
0.00013468118 |
| 1,187 |
JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes |
2019 |
SIGMOD |
0.00013443639 |
| 1,262 |
RankSQL: Query Algebra and Optimization for Relational Top-k Queries |
2005 |
SIGMOD |
0.00012986539 |
| 1,277 |
The Data Civilizer System |
2017 |
CIDR |
0.00012879695 |
| 1,367 |
Answering Table Queries on the Web using Column Keywords |
2012 |
VLDB |
0.00012349783 |
| 2,141 |
LSH Ensemble: Internet-Scale Domain Search |
2016 |
VLDB |
9.4542625e-05 |
| 3,110 |
Learning to Create Data-Integrating Queries |
2008 |
VLDB |
7.5475982e-05 |
| 3,155 |
Ten Years of WebTables |
2018 |
VLDB |
7.4672742e-05 |
| 3,281 |
Constance: An Intelligent Data Lake System |
2016 |
SIGMOD |
7.2823287e-05 |
| 4,464 |
Magellan: Toward Building Entity Matching Management Systems over Data Science Stacks |
2016 |
VLDB |
6.1606042e-05 |
| 4,595 |
Juneau: Data Lake Management for Jupyter |
2019 |
VLDB |
6.060188e-05 |
| 6,355 |
User Feedback as a First Class Citizen in Information Integration Systems |
2011 |
CIDR |
5.0987661e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 939 |
Data Lake Management: Challenges and Opportunities |
2019 |
VLDB |
0.00015187344 |
| 8,917 |
Data Lakes Empowered by Knowledge Graph Technologies |
2021 |
SIGMOD |
4.427232e-05 |
| 10,685 |
LakeVisage: Towards Scalable, Flexible and Interactive Visualization Recommendation for Data Discovery over Data Lakes |
2025 |
VLDB |
4.1945683e-05 |
| 818 |
Finding Related Tables |
2012 |
SIGMOD |
0.00016311524 |
| 8,116 |
LakeBench: A Benchmark for Discovering Joinable and Unionable Tables in Data Lakes |
2024 |
VLDB |
4.581507e-05 |
| 6,981 |
Dataset Relationship Management |
2019 |
CIDR |
4.8743957e-05 |
| 5,794 |
Discovering Related Data At Scale |
2021 |
VLDB |
5.3245122e-05 |
| 7,582 |
LakeCompass: An End-to-End System for Data Maintenance, Search and Analysis in Data Lakes |
2024 |
VLDB |
4.7046388e-05 |
| 4,595 |
Juneau: Data Lake Management for Jupyter |
2019 |
VLDB |
6.060188e-05 |
| 11,063 |
Searching Data Lakes for Nested and Joined Data |
2024 |
VLDB |
4.1945683e-05 |