| 1,116 |
Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes |
2024 |
VLDB |
0.00013890154 |
| 1,541 |
Symphony: Towards Natural Language Query Answering over Multi-modal Data Lakes |
2023 |
CIDR |
0.00011456579 |
| 2,587 |
Table-GPT: Table Fine-tuned GPT for Diverse Table Tasks |
2024 |
SIGMOD |
8.4924618e-05 |
| 3,015 |
Chorus: Foundation Models for Unified Data Discovery and Exploration |
2024 |
VLDB |
7.7092391e-05 |
| 3,114 |
GPTuner: A Manual-Reading Database Tuning System via GPT-Guided Bayesian Optimization |
2024 |
VLDB |
7.5451724e-05 |
| 3,335 |
DeepJoin: Joinable Table Discovery with Pre-trained Language Models |
2023 |
VLDB |
7.2065006e-05 |
| 3,508 |
spade: Synthesizing Data Quality Assertions for Large Language Model Pipelines |
2024 |
VLDB |
7.0271496e-05 |
| 3,840 |
Revisiting Prompt Engineering via Declarative Crowdsourcing |
2024 |
CIDR |
6.7106924e-05 |
| 3,876 |
The Design of an LLM-powered Unstructured Analytics System |
2025 |
CIDR |
6.6741456e-05 |
| 3,995 |
How Large Language Models Will Disrupt Data Management |
2023 |
VLDB |
6.5513237e-05 |
| 4,212 |
Unicorn: A Unified Multi-tasking Model for Supporting Matching Tasks in Data Integration |
2023 |
SIGMOD |
6.3555142e-05 |
| 4,535 |
Hybrid Querying Over Relational Databases and Large Language Models |
2025 |
CIDR |
6.1049669e-05 |
| 5,023 |
GenRewrite: Query Rewriting via Large Language Models |
2026 |
SIGMOD |
5.75363e-05 |
| 5,099 |
ArcheType: A Novel Framework for Open-Source Column Type Annotation using Large Language Models |
2024 |
VLDB |
5.6997784e-05 |
| 5,462 |
RetClean: Retrieval-Based Data Cleaning Using LLMs and Data Lakes |
2024 |
VLDB |
5.494769e-05 |
| 5,509 |
Can Large Language Models Predict Data Correlations from Column Names? |
2023 |
VLDB |
5.4703368e-05 |
| 5,928 |
SchemaPile: A Large Collection of Relational Database Schemas |
2024 |
SIGMOD |
5.2685946e-05 |
| 6,077 |
The Fast and the Private: Task-based Dataset Search |
2024 |
CIDR |
5.2229324e-05 |
| 6,092 |
Observatory: Characterizing Embeddings of Relational Tables |
2024 |
VLDB |
5.2138566e-05 |
| 6,553 |
How do Categorical Duplicates Affect ML? A New Benchmark and Empirical Analyses |
2024 |
VLDB |
5.0157344e-05 |
| 7,026 |
Mind the Data Gap: Bridging LLMs to Enterprise Data Integration |
2025 |
CIDR |
4.8570811e-05 |
| 7,048 |
Magneto: Combining Small and Large Language Models for Schema Matching |
2025 |
VLDB |
4.8520651e-05 |
| 7,152 |
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity |
2024 |
VLDB |
4.8154191e-05 |
| 8,052 |
Generating Succinct Descriptions of Database Schemata for Cost-Efficient Prompting of Large Language Models |
2024 |
VLDB |
4.5953106e-05 |
| 8,204 |
ELEET: Efficient Learned Query Execution over Text and Tables |
2024 |
VLDB |
4.5594273e-05 |
| 8,207 |
SQLStorm: Taking Database Benchmarking into the LLM Era |
2025 |
VLDB |
4.5583637e-05 |
| 8,208 |
SMARTFEAT: Efficient Feature Construction through Feature-Level Foundation Model Interactions |
2024 |
CIDR |
4.5581306e-05 |
| 8,257 |
Automating and Optimizing Data-Centric What-If Analyses on Native Machine Learning Pipelines |
2023 |
SIGMOD |
4.5487511e-05 |
| 8,683 |
FormaT5: Abstention and Examples for Conditional Table Formatting with Natural Language |
2024 |
VLDB |
4.4686885e-05 |
| 8,736 |
Unveiling Challenges for LLMs in Enterprise Data Engineering |
2026 |
VLDB |
4.456315e-05 |
| 8,847 |
Towards Foundation Database Models |
2025 |
CIDR |
4.4371897e-05 |
| 9,348 |
GIDCL: A Graph-Enhanced Interpretable Data Cleaning Framework with Large Language Models |
2024 |
SIGMOD |
4.3526427e-05 |
| 9,389 |
DataVinci: Learning Syntactic and Semantic String Repairs |
2025 |
SIGMOD |
4.3441378e-05 |
| 9,476 |
Adda: Towards Efficient in-Database Feature Generation via LLM-based Agents |
2025 |
SIGMOD |
4.3341665e-05 |
| 9,479 |
Data Imputation with Limited Data Redundancy Using Data Lakes |
2025 |
VLDB |
4.3341665e-05 |
| 9,492 |
Lingua Manga : A Generic Large Language Model Centric System for Data Curation |
2023 |
VLDB |
4.3341665e-05 |
| 9,515 |
Automating the Enterprise with Foundation Models |
2024 |
VLDB |
4.3335877e-05 |
| 10,022 |
In-context Clustering-based Entity Resolution with Large Language Models: A Design Space Exploration |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,064 |
Cut Costs, Not Accuracy: LLM-Powered Data Processing with Guarantees |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,443 |
LLM-Matcher: A Name-Based Schema Matching Tool using Large Language Models |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,465 |
A Cost-Effective LLM-based Approach to Identify Wildlife Trafficking in Online Marketplaces |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,512 |
Auto-Test: Learning Semantic-Domain Constraints for Unsupervised Error Detection in Tables |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,595 |
Optimized Batch Prompting for Cost-effective LLMs |
2025 |
VLDB |
4.1945683e-05 |
| 10,598 |
Auto-Prep: Holistic Prediction of Data Preparation Steps for Self-Service Business Intelligence |
2025 |
VLDB |
4.1945683e-05 |
| 10,610 |
Weak-to-Strong Prompts with Lightweight-to-Powerful LLMs for High-Accuracy, Low-Cost, and Explainable Data Transformation |
2025 |
VLDB |
4.1945683e-05 |
| 10,617 |
Deduplicated Sampling On-Demand |
2025 |
VLDB |
4.1945683e-05 |
| 10,628 |
CatDB: Data-catalog-guided, LLM-based Generation of Data-centric ML Pipelines |
2025 |
VLDB |
4.1945683e-05 |
| 10,675 |
On LLM-Enhanced Mixed-Type Data Imputation with High-Order Message Passing |
2025 |
VLDB |
4.1945683e-05 |
| 10,753 |
Cents: A Flexible and Cost-Effective Framework for LLM-Based Table Understanding |
2025 |
VLDB |
4.1945683e-05 |
| 10,835 |
Large Language Models for Spatial Analysis Queries |
2025 |
VLDB |
4.1945683e-05 |