| 939 |
Data Lake Management: Challenges and Opportunities |
2019 |
VLDB |
0.00015187344 |
| 1,187 |
JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes |
2019 |
SIGMOD |
0.00013443639 |
| 1,541 |
Symphony: Towards Natural Language Query Answering over Multi-modal Data Lakes |
2023 |
CIDR |
0.00011456579 |
| 1,644 |
Finding Related Tables in Data Lakes for Interactive Data Science |
2020 |
SIGMOD |
0.00011041787 |
| 1,831 |
Synthesizing Entity Matching Rules by Examples |
2018 |
VLDB |
0.00010384082 |
| 1,894 |
Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning |
2020 |
VLDB |
0.0001018378 |
| 2,209 |
Data Integration: After the Teenage Years |
2017 |
PODS |
9.2868035e-05 |
| 2,349 |
RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation |
2021 |
VLDB |
8.9876423e-05 |
| 2,359 |
Data Market Platforms: Trading Data Assets to Solve Data Problems |
2020 |
VLDB |
8.9607667e-05 |
| 2,517 |
Annotating Columns with Pre-trained Language Models |
2022 |
SIGMOD |
8.6092139e-05 |
| 2,730 |
Open Data Integration |
2018 |
VLDB |
8.2126735e-05 |
| 2,968 |
Raha: A Configuration-Free Error Detection System |
2019 |
SIGMOD |
7.7985097e-05 |
| 3,252 |
Auto-Suggest: Learning-to-Recommend Data Preparation Steps Using Data Science Notebooks |
2020 |
SIGMOD |
7.3178277e-05 |
| 3,265 |
RHEEM: Enabling Cross-Platform Data Processing - May The Big Data Be With You! - |
2018 |
VLDB |
7.3083672e-05 |
| 3,358 |
Organizing Data Lakes for Navigation |
2020 |
SIGMOD |
7.1784949e-05 |
| 3,467 |
Data Profiling – A Tutorial |
2017 |
SIGMOD |
7.069081e-05 |
| 3,824 |
Correlation Sketches for Approximate Join-Correlation Queries |
2021 |
SIGMOD |
6.7260705e-05 |
| 4,212 |
Unicorn: A Unified Multi-tasking Model for Supporting Matching Tasks in Data Integration |
2023 |
SIGMOD |
6.3555142e-05 |
| 4,595 |
Juneau: Data Lake Management for Jupyter |
2019 |
VLDB |
6.060188e-05 |
| 5,058 |
A Demo of the Data Civilizer System |
2017 |
SIGMOD |
5.7280139e-05 |
| 5,153 |
Horizon: Scalable Dependency-driven Data Cleaning |
2021 |
VLDB |
5.6607963e-05 |
| 5,179 |
SilkMoth: An Efficient Method for Finding Related Sets with Maximum Matching Constraints |
2017 |
VLDB |
5.6428428e-05 |
| 5,383 |
Auto-Pipeline: Synthesizing Complex Data Pipelines By-Target Using Reinforcement Learning and Search |
2021 |
VLDB |
5.5393038e-05 |
| 5,794 |
Discovering Related Data At Scale |
2021 |
VLDB |
5.3245122e-05 |
| 6,187 |
Semi-Supervised Data Cleaning with Raha and Baran |
2021 |
CIDR |
5.1656857e-05 |
| 6,280 |
Self-supervised and Interpretable Data Cleaning with Sequence Generative Adversarial Networks |
2023 |
VLDB |
5.1290457e-05 |
| 6,360 |
High-Dimensional Vector Similarity Search: From Time Series to Deep Network Embeddings |
2020 |
SIGMOD |
5.0961051e-05 |
| 7,303 |
DICE: Data Discovery by Example |
2021 |
VLDB |
4.7684686e-05 |
| 7,311 |
The Machine Learning Bazaar: Harnessing the ML Ecosystem for Effective System Development |
2020 |
SIGMOD |
4.7656884e-05 |
| 7,384 |
The VADA Architecture for Cost-Effective Data Wrangling |
2017 |
SIGMOD |
4.7445719e-05 |
| 7,411 |
ItemSuggest: A Data Management Platform for Machine Learned Ranking Services |
2019 |
CIDR |
4.7364436e-05 |
| 7,643 |
Cross Modal Data Discovery over Structured and Unstructured Data Lakes |
2023 |
VLDB |
4.6901105e-05 |
| 7,704 |
ExDRa: Exploratory Data Science on Federated Raw Data |
2021 |
SIGMOD |
4.6733838e-05 |
| 7,745 |
Crossing the finish line faster when paddling the Data Lake with KAYAK |
2017 |
VLDB |
4.6618625e-05 |
| 7,858 |
ConnectionLens: Finding Connections Across Heterogeneous Data Sources |
2018 |
VLDB |
4.6342491e-05 |
| 8,000 |
Data Civilizer 2.0: A Holistic Framework for Data Preparation and Analytics |
2019 |
VLDB |
4.6092803e-05 |
| 8,092 |
Saga: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications |
2023 |
SIGMOD |
4.587921e-05 |
| 8,116 |
LakeBench: A Benchmark for Discovering Joinable and Unionable Tables in Data Lakes |
2024 |
VLDB |
4.581507e-05 |
| 8,696 |
Effective Entity Augmentation By Querying External Data Sources |
2023 |
VLDB |
4.4660032e-05 |
| 8,729 |
OneProvenance: Efficient Extraction of Dynamic Coarse-Grained Provenance From Database Query Event Logs |
2023 |
VLDB |
4.4582221e-05 |
| 8,974 |
DataLoom: Simplifying Data Loading with LLMs |
2024 |
VLDB |
4.4184286e-05 |
| 9,253 |
Glean: Structured Extractions from Templatic Documents |
2021 |
VLDB |
4.3690661e-05 |
| 9,306 |
Debugging Large-Scale Data Science Pipelines using Dagger |
2020 |
VLDB |
4.3572942e-05 |
| 9,379 |
GIO: Generating Efficient Matrix and Frame Readers for Custom Data Formats by Example |
2023 |
SIGMOD |
4.3462787e-05 |
| 9,412 |
Retrofitting GDPR Compliance onto Legacy Databases |
2022 |
VLDB |
4.3441378e-05 |
| 9,961 |
QueryArtisan: Generating Data Manipulation Codes for Ad-hoc Analysis in Data Lakes |
2025 |
VLDB |
4.2294678e-05 |
| 10,291 |
Morphing-based Compression for Data-centric ML Pipelines |
2026 |
VLDB |
4.1945683e-05 |
| 10,610 |
Weak-to-Strong Prompts with Lightweight-to-Powerful LLMs for High-Accuracy, Low-Cost, and Explainable Data Transformation |
2025 |
VLDB |
4.1945683e-05 |
| 10,828 |
Buckaroo: A Direct Manipulation Visual Data Wrangler |
2025 |
VLDB |
4.1945683e-05 |
| 11,063 |
Searching Data Lakes for Nested and Joined Data |
2024 |
VLDB |
4.1945683e-05 |