Database Paper Browser

Back to papers

Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes

Summary: Evaporate: an LLM-based system that converts heterogeneous documents into queryable tables using in‑context learning rather than domain-specific training. Evaporate‑Code+ ensembles many synthesized extractors with weak supervision to approach/exceed direct extraction quality while using a sublinear LLM pass (≈110× fewer document calls). (summarized by gpt-5-mini on Feb 09 2026)

Paper ID
13765
Venue
VLDB
Year
2024
Pagerank
0.00013890154
Overall Rank
1,116 | 92.24%
DOI
10.14778/3626292.3626294

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 27 of 27 citing papers.

Rank Citing Paper Year Venue Pagerank
1,963 DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing 2025 VLDB 9.929429e-05
2,106 Palimpzest: Optimizing AI-Powered Analytics with Declarative Query Processing 2025 CIDR 9.5342543e-05
3,015 Chorus: Foundation Models for Unified Data Discovery and Exploration 2024 VLDB 7.7092391e-05
3,876 The Design of an LLM-powered Unstructured Analytics System 2025 CIDR 6.6741456e-05
5,509 Can Large Language Models Predict Data Correlations from Column Names? 2023 VLDB 5.4703368e-05
5,658 Databases Unbound: Querying All of the World's Bytes with AI 2024 VLDB 5.385675e-05
5,840 Logical and Physical Optimizations for SQL Query Execution over Large Language Models 2025 SIGMOD 5.3042561e-05
7,026 Mind the Data Gap: Bridging LLMs to Enterprise Data Integration 2025 CIDR 4.8570811e-05
7,705 AOP: Automated and Interactive LLM Pipeline Orchestration for Answering Complex Queries 2025 CIDR 4.6730494e-05
8,186 E2ETune: End-to-End Knob Tuning via Fine-tuned Generative Language Model 2025 VLDB 4.5651684e-05
8,204 ELEET: Efficient Learned Query Execution over Text and Tables 2024 VLDB 4.5594273e-05
8,469 Semantic Operators and Their Optimization: Enabling LLM-Based Data Processing with Accuracy Guarantees in LOTUS 2025 VLDB 4.5041113e-05
8,488 Can Large Language Models Be Query Optimizer for Relational Databases? 2026 SIGMOD 4.4998609e-05
8,520 mLoRA: Fine-Tuning LoRA Adapters via Highly-Efficient Pipeline Parallelism in Multiple GPUs 2025 VLDB 4.4937074e-05
9,152 Doctopus: Budget-aware Structural Table Extraction from Unstructured Documents 2025 VLDB 4.3849295e-05
9,972 KathDB: Explainable Multimodal Database Management System with Human-AI Collaboration 2026 CIDR 4.1945683e-05
10,064 Cut Costs, Not Accuracy: LLM-Powered Data Processing with Guarantees 2026 SIGMOD 4.1945683e-05
10,115 ST-Raptor: LLM-Powered Semi-Structured Table Question Answering 2026 SIGMOD 4.1945683e-05
10,126 Visual Template Inference for Data Extraction from Documents 2026 SIGMOD 4.1945683e-05
10,215 Task Cascades for Efficient Unstructured Data Processing 2026 SIGMOD 4.1945683e-05
10,438 Doctopus: A System for Budget-aware Structural Data Extraction from Unstructured Documents 2025 SIGMOD 4.1945683e-05
10,455 Sentence to Model: Cost-Effective Data Collection LLM Agent 2025 SIGMOD 4.1945683e-05
10,456 SwellDB: Dynamic Query-Driven Table Generation with Large Language Models 2025 SIGMOD 4.1945683e-05
10,595 Optimized Batch Prompting for Cost-effective LLMs 2025 VLDB 4.1945683e-05
10,713 CoLA: Model Collaboration for Log-based Anomaly Detection 2025 VLDB 4.1945683e-05
10,752 QUEST: Query Optimization in Unstructured Document Analysis 2025 VLDB 4.1945683e-05
11,068 Chameleon: Foundation Models for Fairness-aware Multi-modal Data Augmentation to Enhance Coverage of Minorities 2024 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 11 of 11 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers