| 1,116 |
Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes |
2024 |
VLDB |
0.00013890154 |
| 1,644 |
Finding Related Tables in Data Lakes for Interactive Data Science |
2020 |
SIGMOD |
0.00011041787 |
| 2,836 |
Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning |
2023 |
VLDB |
8.0443826e-05 |
| 3,000 |
SANTOS: Relationship-based Semantic Table Union Search |
2023 |
SIGMOD |
7.7462128e-05 |
| 3,015 |
Chorus: Foundation Models for Unified Data Discovery and Exploration |
2024 |
VLDB |
7.7092391e-05 |
| 3,114 |
GPTuner: A Manual-Reading Database Tuning System via GPT-Guided Bayesian Optimization |
2024 |
VLDB |
7.5451724e-05 |
| 4,749 |
Slice Tuner: A Selective Data Acquisition Framework for Accurate and Fair Machine Learning Models |
2021 |
SIGMOD |
5.9503689e-05 |
| 4,859 |
Integrating Data Lake Tables |
2023 |
VLDB |
5.8732433e-05 |
| 4,863 |
Data-Sharing Markets: Model, Protocol, and Algorithms to Incentivize the Formation of Data-Sharing Consortia |
2023 |
SIGMOD |
5.8697471e-05 |
| 4,957 |
Doing More with Less: Characterizing Dataset Downsampling for AutoML |
2021 |
VLDB |
5.8035715e-05 |
| 5,280 |
Explaining Dataset Changes for Semantic Data Versioning with Explain-Da-V |
2023 |
VLDB |
5.5896735e-05 |
| 5,529 |
Data-Driven Domain Discovery for Structured Datasets |
2020 |
VLDB |
5.4566641e-05 |
| 5,794 |
Discovering Related Data At Scale |
2021 |
VLDB |
5.3245122e-05 |
| 5,928 |
SchemaPile: A Large Collection of Relational Database Schemas |
2024 |
SIGMOD |
5.2685946e-05 |
| 6,081 |
Subgraph Matching over Graph Federation |
2022 |
VLDB |
5.2208051e-05 |
| 6,438 |
RONIN: Data Lake Exploration |
2021 |
VLDB |
5.0620163e-05 |
| 6,526 |
Data Collection and Quality Challenges for Deep Learning |
2020 |
VLDB |
5.0267429e-05 |
| 7,643 |
Cross Modal Data Discovery over Structured and Unstructured Data Lakes |
2023 |
VLDB |
4.6901105e-05 |
| 8,008 |
Entity Resolution On-Demand |
2022 |
VLDB |
4.6067684e-05 |
| 8,116 |
LakeBench: A Benchmark for Discovering Joinable and Unionable Tables in Data Lakes |
2024 |
VLDB |
4.581507e-05 |
| 8,608 |
Unity Catalog: Open and Universal Governance for the Lakehouse and Beyond |
2025 |
SIGMOD |
4.4853979e-05 |
| 8,729 |
OneProvenance: Efficient Extraction of Dynamic Coarse-Grained Provenance From Database Query Event Logs |
2023 |
VLDB |
4.4582221e-05 |
| 8,974 |
DataLoom: Simplifying Data Loading with LLMs |
2024 |
VLDB |
4.4184286e-05 |
| 9,701 |
Towards Functional Decomposition of Storage Formats |
2025 |
CIDR |
4.3008468e-05 |
| 9,773 |
EquiTensors: Learning Fair Integrations of Heterogeneous Urban Data |
2021 |
SIGMOD |
4.2856106e-05 |
| 9,961 |
QueryArtisan: Generating Data Manipulation Codes for Ad-hoc Analysis in Data Lakes |
2025 |
VLDB |
4.2294678e-05 |
| 10,142 |
AutoDDG: Automated Dataset Description Generation using Large Language Models |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,197 |
Qualitative Join Discovery in Data Lakes using Examples |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,510 |
Table Overlap Estimation through Graph Embeddings |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,645 |
OpenForge: Probabilistic Metadata Integration |
2025 |
VLDB |
4.1945683e-05 |
| 10,797 |
A Demonstration of QueryArtisan: Real-Time Data Lake Analysis via Dynamically Generated Data Manipulation Code |
2025 |
VLDB |
4.1945683e-05 |
| 10,803 |
GraphAr: An Efficient Storage Scheme for Graph Data in Data Lakes |
2025 |
VLDB |
4.1945683e-05 |
| 10,829 |
Sort it Like You Mean It: Discovering Semantically Interesting Attribute Augmentations to Sort Tables |
2025 |
VLDB |
4.1945683e-05 |
| 10,854 |
LiquidCache: Efficient Pushdown Caching for Cloud-Native Data Analytics |
2025 |
VLDB |
4.1945683e-05 |
| 10,895 |
Towards an Objective Metric for Data Value Through Relevance |
2024 |
CIDR |
4.1945683e-05 |
| 10,951 |
Determining the Largest Overlap between Tables |
2024 |
SIGMOD |
4.1945683e-05 |
| 11,006 |
FusionQuery: On-demand Fusion Queries over Multi-source Heterogeneous Data |
2024 |
VLDB |
4.1945683e-05 |
| 11,063 |
Searching Data Lakes for Nested and Joined Data |
2024 |
VLDB |
4.1945683e-05 |
| 11,076 |
KGFabric: A Scalable Knowledge Graph Warehouse for Enterprise Data Interconnection |
2024 |
VLDB |
4.1945683e-05 |
| 11,420 |
Detecting Layout Templates in Complex Multiregion Files |
2022 |
VLDB |
4.1945683e-05 |