Back to papers
Searching Data Lakes for Nested and Joined Data
Summary: Extends Juneau to search data lakes for hierarchical/nested matches by synthesizing views that join and transform multiple tables to match JSON or table search objects rather than single tables. Proposes a ranked-result merging framework with novel indexing and sketching that integrates single-table search; yields up to 4.8x faster retrieval, 43% more related results, +28% domain coverage and measurable downstream ML improvements.
(summarized by gpt-5-mini on Feb 09 2026)
- Paper ID
- 13547
- Venue
- VLDB
- Year
- 2024
- Pagerank
- 4.1945683e-05
- Overall Rank
- 11,063 | 23.04%
- DOI
-
10.14778/3681954.3682005
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
Outgoing Citations (Sorted by Pagerank)
Showing 23 of 23 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 7 |
Optimal Aggregation Algorithms for Middleware [Extended Abstract] |
2001 |
PODS |
0.0015496097 |
| 36 |
Fast Algorithms for Mining Association Rules |
1994 |
VLDB |
0.00076161096 |
| 107 |
WebTables: Exploring the Power of Tables on the Web |
2008 |
VLDB |
0.00048377684 |
| 153 |
Relational Databases for Querying XML Documents: Limitations and Opportunities |
1999 |
VLDB |
0.00040784455 |
| 207 |
Storing Semistructured Data with STORED |
1999 |
SIGMOD |
0.00034611968 |
| 513 |
TURL: Table Understanding through Representation Learning |
2021 |
VLDB |
0.00021288342 |
| 552 |
Supporting Incremental Join Queries on Ranked Inputs |
2001 |
VLDB |
0.00020310903 |
| 610 |
Goods: Organizing Google's Datasets |
2016 |
SIGMOD |
0.00019232674 |
| 939 |
Data Lake Management: Challenges and Opportunities |
2019 |
VLDB |
0.00015187344 |
| 1,001 |
Recovering Semantics of Tables on the Web |
2011 |
VLDB |
0.00014706505 |
| 1,187 |
JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes |
2019 |
SIGMOD |
0.00013443639 |
| 1,277 |
The Data Civilizer System |
2017 |
CIDR |
0.00012879695 |
| 1,367 |
Answering Table Queries on the Web using Column Keywords |
2012 |
VLDB |
0.00012349783 |
| 1,644 |
Finding Related Tables in Data Lakes for Interactive Data Science |
2020 |
SIGMOD |
0.00011041787 |
| 2,141 |
LSH Ensemble: Internet-Scale Domain Search |
2016 |
VLDB |
9.4542625e-05 |
| 2,836 |
Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning |
2023 |
VLDB |
8.0443826e-05 |
| 3,000 |
SANTOS: Relationship-based Semantic Table Union Search |
2023 |
SIGMOD |
7.7462128e-05 |
| 3,155 |
Ten Years of WebTables |
2018 |
VLDB |
7.4672742e-05 |
| 3,252 |
Auto-Suggest: Learning-to-Recommend Data Preparation Steps Using Data Science Notebooks |
2020 |
SIGMOD |
7.3178277e-05 |
| 3,335 |
DeepJoin: Joinable Table Discovery with Pre-trained Language Models |
2023 |
VLDB |
7.2065006e-05 |
| 4,859 |
Integrating Data Lake Tables |
2023 |
VLDB |
5.8732433e-05 |
| 7,052 |
Pre-trained Embeddings for Entity Resolution: An Experimental Analysis |
2023 |
VLDB |
4.8497453e-05 |
| 7,838 |
Auto-Validate: Unsupervised Data Validation Using Data-Domain Patterns Inferred from Data Lakes |
2021 |
SIGMOD |
4.6377995e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 6,792 |
Automatically Incorporating New Sources in Keyword Search-Based Data Integration |
2010 |
SIGMOD |
4.9249098e-05 |
| 1,187 |
JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes |
2019 |
SIGMOD |
0.00013443639 |
| 1,178 |
Table Union Search on Open Data |
2018 |
VLDB |
0.00013468118 |
| 3,358 |
Organizing Data Lakes for Navigation |
2020 |
SIGMOD |
7.1784949e-05 |
| 2,836 |
Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning |
2023 |
VLDB |
8.0443826e-05 |
| 5,794 |
Discovering Related Data At Scale |
2021 |
VLDB |
5.3245122e-05 |
| 3,824 |
Correlation Sketches for Approximate Join-Correlation Queries |
2021 |
SIGMOD |
6.7260705e-05 |
| 8,116 |
LakeBench: A Benchmark for Discovering Joinable and Unionable Tables in Data Lakes |
2024 |
VLDB |
4.581507e-05 |
| 3,335 |
DeepJoin: Joinable Table Discovery with Pre-trained Language Models |
2023 |
VLDB |
7.2065006e-05 |
| 1,644 |
Finding Related Tables in Data Lakes for Interactive Data Science |
2020 |
SIGMOD |
0.00011041787 |