Database Paper Browser

Back to papers

Searching Data Lakes for Nested and Joined Data

Summary: Extends Juneau to search data lakes for hierarchical/nested matches by synthesizing views that join and transform multiple tables to match JSON or table search objects rather than single tables. Proposes a ranked-result merging framework with novel indexing and sketching that integrates single-table search; yields up to 4.8x faster retrieval, 43% more related results, +28% domain coverage and measurable downstream ML improvements. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID
13547
Venue
VLDB
Year
2024
Pagerank
4.1945683e-05
Overall Rank
11,063 | 23.04%
DOI
10.14778/3681954.3682005

Incoming Non-self Citations Over Time

No non-self incoming citations found for this paper in this database.

Authors

Incoming Citations (Sorted by Pagerank)

Showing 0 of 0 citing papers.

Rank Citing Paper Year Venue Pagerank
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 23 of 23 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
7 Optimal Aggregation Algorithms for Middleware [Extended Abstract] 2001 PODS 0.0015496097
36 Fast Algorithms for Mining Association Rules 1994 VLDB 0.00076161096
107 WebTables: Exploring the Power of Tables on the Web 2008 VLDB 0.00048377684
153 Relational Databases for Querying XML Documents: Limitations and Opportunities 1999 VLDB 0.00040784455
207 Storing Semistructured Data with STORED 1999 SIGMOD 0.00034611968
513 TURL: Table Understanding through Representation Learning 2021 VLDB 0.00021288342
552 Supporting Incremental Join Queries on Ranked Inputs 2001 VLDB 0.00020310903
610 Goods: Organizing Google's Datasets 2016 SIGMOD 0.00019232674
939 Data Lake Management: Challenges and Opportunities 2019 VLDB 0.00015187344
1,001 Recovering Semantics of Tables on the Web 2011 VLDB 0.00014706505
1,187 JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes 2019 SIGMOD 0.00013443639
1,277 The Data Civilizer System 2017 CIDR 0.00012879695
1,367 Answering Table Queries on the Web using Column Keywords 2012 VLDB 0.00012349783
1,644 Finding Related Tables in Data Lakes for Interactive Data Science 2020 SIGMOD 0.00011041787
2,141 LSH Ensemble: Internet-Scale Domain Search 2016 VLDB 9.4542625e-05
2,836 Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning 2023 VLDB 8.0443826e-05
3,000 SANTOS: Relationship-based Semantic Table Union Search 2023 SIGMOD 7.7462128e-05
3,155 Ten Years of WebTables 2018 VLDB 7.4672742e-05
3,252 Auto-Suggest: Learning-to-Recommend Data Preparation Steps Using Data Science Notebooks 2020 SIGMOD 7.3178277e-05
3,335 DeepJoin: Joinable Table Discovery with Pre-trained Language Models 2023 VLDB 7.2065006e-05
4,859 Integrating Data Lake Tables 2023 VLDB 5.8732433e-05
7,052 Pre-trained Embeddings for Entity Resolution: An Experimental Analysis 2023 VLDB 4.8497453e-05
7,838 Auto-Validate: Unsupervised Data Validation Using Data-Domain Patterns Inferred from Data Lakes 2021 SIGMOD 4.6377995e-05
Previous Page 1 / 1 Next

Semantically Similar Papers