| 939 |
Data Lake Management: Challenges and Opportunities |
2019 |
VLDB |
0.00015187344 |
| 1,644 |
Finding Related Tables in Data Lakes for Interactive Data Science |
2020 |
SIGMOD |
0.00011041787 |
| 2,836 |
Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning |
2023 |
VLDB |
8.0443826e-05 |
| 3,015 |
Chorus: Foundation Models for Unified Data Discovery and Exploration |
2024 |
VLDB |
7.7092391e-05 |
| 3,335 |
DeepJoin: Joinable Table Discovery with Pre-trained Language Models |
2023 |
VLDB |
7.2065006e-05 |
| 3,358 |
Organizing Data Lakes for Navigation |
2020 |
SIGMOD |
7.1784949e-05 |
| 3,824 |
Correlation Sketches for Approximate Join-Correlation Queries |
2021 |
SIGMOD |
6.7260705e-05 |
| 4,540 |
Automating Exploratory Data Analysis via Machine Learning: An Overview |
2020 |
SIGMOD |
6.1033443e-05 |
| 4,859 |
Integrating Data Lake Tables |
2023 |
VLDB |
5.8732433e-05 |
| 5,024 |
Towards Distribution-aware Query Answering in Data Markets |
2022 |
VLDB |
5.7535043e-05 |
| 5,280 |
Explaining Dataset Changes for Semantic Data Versioning with Explain-Da-V |
2023 |
VLDB |
5.5896735e-05 |
| 5,658 |
Databases Unbound: Querying All of the World's Bytes with AI |
2024 |
VLDB |
5.385675e-05 |
| 5,691 |
Putting Things into Context: Rich Explanations for Query Answers using Join Graphs |
2021 |
SIGMOD |
5.3684557e-05 |
| 5,794 |
Discovering Related Data At Scale |
2021 |
VLDB |
5.3245122e-05 |
| 5,952 |
Eraser: Eliminating Performance Regression on Learned Query Optimizer |
2024 |
VLDB |
5.2591691e-05 |
| 5,976 |
Responsible Data Integration: Next-generation Challenges |
2022 |
SIGMOD |
5.245976e-05 |
| 6,092 |
Observatory: Characterizing Embeddings of Relational Tables |
2024 |
VLDB |
5.2138566e-05 |
| 6,270 |
MATE: Multi-Attribute Table Extraction |
2022 |
VLDB |
5.1337451e-05 |
| 6,449 |
Causal Data Integration |
2023 |
VLDB |
5.0587746e-05 |
| 7,582 |
LakeCompass: An End-to-End System for Data Maintenance, Search and Analysis in Data Lakes |
2024 |
VLDB |
4.7046388e-05 |
| 7,643 |
Cross Modal Data Discovery over Structured and Unstructured Data Lakes |
2023 |
VLDB |
4.6901105e-05 |
| 8,116 |
LakeBench: A Benchmark for Discovering Joinable and Unionable Tables in Data Lakes |
2024 |
VLDB |
4.581507e-05 |
| 8,193 |
WarpGate: A Semantic Join Discovery System for Cloud Data Warehouses |
2023 |
CIDR |
4.5618596e-05 |
| 8,503 |
A Demonstration of KGLac: A Data Discovery and Enrichment Platform for Data Science |
2021 |
VLDB |
4.496339e-05 |
| 8,618 |
Nexus: Correlation Discovery over Collections of Spatio-Temporal Tabular Data |
2024 |
SIGMOD |
4.4838259e-05 |
| 8,696 |
Effective Entity Augmentation By Querying External Data Sources |
2023 |
VLDB |
4.4660032e-05 |
| 8,736 |
Unveiling Challenges for LLMs in Enterprise Data Engineering |
2026 |
VLDB |
4.456315e-05 |
| 8,910 |
R2D2: Reducing Redundancy and Duplication in Data Lakes |
2023 |
SIGMOD |
4.427232e-05 |
| 8,917 |
Data Lakes Empowered by Knowledge Graph Technologies |
2021 |
SIGMOD |
4.427232e-05 |
| 8,974 |
DataLoom: Simplifying Data Loading with LLMs |
2024 |
VLDB |
4.4184286e-05 |
| 9,703 |
CaJaDE: Explaining Query Results by Augmenting Provenance with Context |
2022 |
VLDB |
4.3005882e-05 |
| 9,961 |
QueryArtisan: Generating Data Manipulation Codes for Ad-hoc Analysis in Data Lakes |
2025 |
VLDB |
4.2294678e-05 |
| 10,109 |
Retrieve-and-Verify: A Table Context Selection Framework for Accurate Column Annotations |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,142 |
AutoDDG: Automated Dataset Description Generation using Large Language Models |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,197 |
Qualitative Join Discovery in Data Lakes using Examples |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,510 |
Table Overlap Estimation through Graph Embeddings |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,589 |
Birdie: Natural Language-Driven Table Discovery Using Differentiable Search Index |
2025 |
VLDB |
4.1945683e-05 |
| 10,598 |
Auto-Prep: Holistic Prediction of Data Preparation Steps for Self-Service Business Intelligence |
2025 |
VLDB |
4.1945683e-05 |
| 10,685 |
LakeVisage: Towards Scalable, Flexible and Interactive Visualization Recommendation for Data Discovery over Data Lakes |
2025 |
VLDB |
4.1945683e-05 |
| 10,725 |
Suna: Scalable Causal Confounder Discovery over Relational Data |
2025 |
VLDB |
4.1945683e-05 |
| 10,754 |
OmniMatch: Joinability Discovery in Data Products |
2025 |
VLDB |
4.1945683e-05 |
| 10,823 |
TableCopilot: A Table Assistant Empowered by Natural Language Conditional Table Discovery |
2025 |
VLDB |
4.1945683e-05 |
| 10,836 |
Data Discovery in Data Lakes: Operations, Indexes, Systems |
2025 |
VLDB |
4.1945683e-05 |
| 10,951 |
Determining the Largest Overlap between Tables |
2024 |
SIGMOD |
4.1945683e-05 |
| 11,054 |
Enriching Relations with Additional Attributes for ER |
2024 |
VLDB |
4.1945683e-05 |
| 11,063 |
Searching Data Lakes for Nested and Joined Data |
2024 |
VLDB |
4.1945683e-05 |
| 11,168 |
Weighted Minwise Hashing Beats Linear Sketching for Inner Product Estimation |
2023 |
PODS |
4.1945683e-05 |
| 11,247 |
A Two-Level Signature Scheme for Stable Set Similarity Joins |
2023 |
VLDB |
4.1945683e-05 |
| 11,305 |
TokenJoin: Efficient Filtering for Set Similarity Join with Maximum Weighted Bipartite Matching |
2023 |
VLDB |
4.1945683e-05 |
| 11,379 |
Fast Dataset Search with Earth Mover’s Distance |
2022 |
VLDB |
4.1945683e-05 |