Back to papers
Organizing Data Lakes for Navigation
Summary: Organizes data lakes as a navigation graph of attribute-set nodes with subset edges. Proposes a probabilistic user model and an approximate algorithm; demonstrates improved navigation over keyword search via real data tests and a user study.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 5818
- Venue
- SIGMOD
- Year
- 2020
- Pagerank
- 7.1784949e-05
- Overall Rank
- 3,358 | 76.65%
- DOI
-
10.1145/3318464.3380605
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 15 of 15 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 3,335 |
DeepJoin: Joinable Table Discovery with Pre-trained Language Models |
2023 |
VLDB |
7.2065006e-05 |
| 5,381 |
Selective Data Acquisition in the Wild for Model Charging |
2022 |
VLDB |
5.5399508e-05 |
| 5,976 |
Responsible Data Integration: Next-generation Challenges |
2022 |
SIGMOD |
5.245976e-05 |
| 6,270 |
MATE: Multi-Attribute Table Extraction |
2022 |
VLDB |
5.1337451e-05 |
| 6,438 |
RONIN: Data Lake Exploration |
2021 |
VLDB |
5.0620163e-05 |
| 8,193 |
WarpGate: A Semantic Join Discovery System for Cloud Data Warehouses |
2023 |
CIDR |
4.5618596e-05 |
| 8,503 |
A Demonstration of KGLac: A Data Discovery and Enrichment Platform for Data Science |
2021 |
VLDB |
4.496339e-05 |
| 8,917 |
Data Lakes Empowered by Knowledge Graph Technologies |
2021 |
SIGMOD |
4.427232e-05 |
| 9,928 |
Fainder: A Fast and Accurate Index for Distribution-Aware Dataset Search |
2024 |
VLDB |
4.2511622e-05 |
| 10,197 |
Qualitative Join Discovery in Data Lakes using Examples |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,341 |
A Theoretical Framework for Distribution-Aware Dataset Search |
2025 |
PODS |
4.1945683e-05 |
| 10,685 |
LakeVisage: Towards Scalable, Flexible and Interactive Visualization Recommendation for Data Discovery over Data Lakes |
2025 |
VLDB |
4.1945683e-05 |
| 10,836 |
Data Discovery in Data Lakes: Operations, Indexes, Systems |
2025 |
VLDB |
4.1945683e-05 |
| 10,951 |
Determining the Largest Overlap between Tables |
2024 |
SIGMOD |
4.1945683e-05 |
| 11,379 |
Fast Dataset Search with Earth Mover’s Distance |
2022 |
VLDB |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 14 of 14 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 364 |
Annotating and Searching Web Tables Using Entities, Types and Relationships |
2010 |
VLDB |
0.00025637562 |
| 518 |
Data Integration for the Relational Web |
2009 |
VLDB |
0.00021158934 |
| 610 |
Goods: Organizing Google's Datasets |
2016 |
SIGMOD |
0.00019232674 |
| 818 |
Finding Related Tables |
2012 |
SIGMOD |
0.00016311524 |
| 1,001 |
Recovering Semantics of Tables on the Web |
2011 |
VLDB |
0.00014706505 |
| 1,178 |
Table Union Search on Open Data |
2018 |
VLDB |
0.00013468118 |
| 1,187 |
JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes |
2019 |
SIGMOD |
0.00013443639 |
| 1,277 |
The Data Civilizer System |
2017 |
CIDR |
0.00012879695 |
| 1,367 |
Answering Table Queries on the Web using Column Keywords |
2012 |
VLDB |
0.00012349783 |
| 2,141 |
LSH Ensemble: Internet-Scale Domain Search |
2016 |
VLDB |
9.4542625e-05 |
| 2,269 |
Ground: A Data Context Service |
2017 |
CIDR |
9.147379e-05 |
| 5,789 |
Interactive Navigation of Open Data Linkages |
2017 |
VLDB |
5.3269741e-05 |
| 6,576 |
Supporting Keyword Search in Product Database: A Probabilistic Approach |
2013 |
VLDB |
5.0046315e-05 |
| 6,845 |
Facet Discovery for Structured Web Search: A Query-log Mining Approach |
2011 |
SIGMOD |
4.9092609e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 8,917 |
Data Lakes Empowered by Knowledge Graph Technologies |
2021 |
SIGMOD |
4.427232e-05 |
| 13,277 |
The Challenge of Building Effective Data Lakes |
2020 |
SIGMOD |
- |
| 7,643 |
Cross Modal Data Discovery over Structured and Unstructured Data Lakes |
2023 |
VLDB |
4.6901105e-05 |
| 1,644 |
Finding Related Tables in Data Lakes for Interactive Data Science |
2020 |
SIGMOD |
0.00011041787 |
| 1,510 |
Summarizing Relational Databases |
2009 |
VLDB |
0.00011606901 |
| 10,685 |
LakeVisage: Towards Scalable, Flexible and Interactive Visualization Recommendation for Data Discovery over Data Lakes |
2025 |
VLDB |
4.1945683e-05 |
| 1,605 |
Addressing Diverse User Preferences in SQL-Query-Result Navigation |
2007 |
SIGMOD |
0.00011186762 |
| 11,063 |
Searching Data Lakes for Nested and Joined Data |
2024 |
VLDB |
4.1945683e-05 |
| 7,582 |
LakeCompass: An End-to-End System for Data Maintenance, Search and Analysis in Data Lakes |
2024 |
VLDB |
4.7046388e-05 |
| 13,602 |
Information Discovery in Loosely Integrated Data |
2007 |
SIGMOD |
- |