Back to papers
DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing
Summary: DocETL: declarative framework that automatically decomposes and rewrites LLM-based document processing pipelines via agentic "rewrite directives" and logical plan rewrites to mitigate LLM omissions. Agent-guided evaluation with latency-aware optimization yields 21–80% accuracy gains.
(summarized by gpt-5-mini on Feb 09 2026)
- Paper ID
- 13940
- Venue
- VLDB
- Year
- 2025
- Pagerank
- 9.929429e-05
- Overall Rank
- 1,963 | 86.35%
- DOI
-
10.14778/3746405.3746426
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 20 of 20 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 3,876 |
The Design of an LLM-powered Unstructured Analytics System |
2025 |
CIDR |
6.6741456e-05 |
| 5,171 |
Abacus: A Cost-Based Optimizer for Semantic Operator Systems |
2026 |
VLDB |
5.6464993e-05 |
| 5,840 |
Logical and Physical Optimizations for SQL Query Execution over Large Language Models |
2025 |
SIGMOD |
5.3042561e-05 |
| 7,119 |
VectraFlow: Integrating Vectors into Stream Processing |
2025 |
CIDR |
4.8262611e-05 |
| 8,469 |
Semantic Operators and Their Optimization: Enabling LLM-Based Data Processing with Accuracy Guarantees in LOTUS |
2025 |
VLDB |
4.5041113e-05 |
| 9,370 |
PalimpChat: Declarative and Interactive AI analytics |
2025 |
SIGMOD |
4.3480692e-05 |
| 9,729 |
Semantic Integrity Constraints: Declarative Guardrails for AI-Augmented Data Processing Systems |
2025 |
VLDB |
4.2942813e-05 |
| 9,968 |
Please Don't Kill My Vibe: Empowering Agents with Data Flow Control |
2026 |
CIDR |
4.1945683e-05 |
| 9,972 |
KathDB: Explainable Multimodal Database Management System with Human-AI Collaboration |
2026 |
CIDR |
4.1945683e-05 |
| 9,985 |
Making Prompts First-Class Citizens for Adaptive LLM Pipelines |
2026 |
CIDR |
4.1945683e-05 |
| 9,990 |
Deep Research is the New Analytics System: Towards Building the Runtime for AI-Driven Analytics |
2026 |
CIDR |
4.1945683e-05 |
| 10,064 |
Cut Costs, Not Accuracy: LLM-Powered Data Processing with Guarantees |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,069 |
Drama: Unifying Data Retrieval and Analysis for Open-Domain Analytic Queries |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,117 |
AixelAsk: A Stepwise-Guided Retrieval and Reasoning Framework for Large Table QA |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,126 |
Visual Template Inference for Data Extraction from Documents |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,144 |
Beyond Relational: Semantic-Aware Multi-Modal Analytics with LLM-Native Query Optimization |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,194 |
PRISM: Navigating Cost–Accuracy Trade-offs for NL2SQL |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,215 |
Task Cascades for Efficient Unstructured Data Processing |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,320 |
ELT-Bench: An End-to-End Benchmark for Evaluating AI Agents on ELT Pipelines |
2026 |
VLDB |
4.1945683e-05 |
| 10,325 |
KEN: An Execution Engine for Unstructured Database Systems |
2026 |
VLDB |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 18 of 18 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 94 |
CrowdDB: Answering Queries with Crowdsourcing |
2011 |
SIGMOD |
0.00051013264 |
| 95 |
Maintaining Views Incrementally |
1993 |
SIGMOD |
0.00050896659 |
| 249 |
Crowdsourced Databases: Query Processing with People |
2011 |
CIDR |
0.00030740523 |
| 316 |
NoScope: Optimizing Neural Network Queries over Video at Scale |
2017 |
VLDB |
0.00027988668 |
| 454 |
An Overview of Query Optimization in Relational Systems |
1998 |
PODS |
0.00022734812 |
| 1,082 |
CAESURA: Language Models as Multi-Modal Query Planners |
2024 |
CIDR |
0.00014214232 |
| 1,116 |
Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes |
2024 |
VLDB |
0.00013890154 |
| 1,407 |
DB-BERT: A Database Tuning Tool that "Reads the Manual" |
2022 |
SIGMOD |
0.00012146739 |
| 2,106 |
Palimpzest: Optimizing AI-Powered Analytics with Declarative Query Processing |
2025 |
CIDR |
9.5342543e-05 |
| 3,015 |
Chorus: Foundation Models for Unified Data Discovery and Exploration |
2024 |
VLDB |
7.7092391e-05 |
| 3,335 |
DeepJoin: Joinable Table Discovery with Pre-trained Language Models |
2023 |
VLDB |
7.2065006e-05 |
| 3,508 |
spade: Synthesizing Data Quality Assertions for Large Language Model Pipelines |
2024 |
VLDB |
7.0271496e-05 |
| 3,840 |
Revisiting Prompt Engineering via Declarative Crowdsourcing |
2024 |
CIDR |
6.7106924e-05 |
| 3,876 |
The Design of an LLM-powered Unstructured Analytics System |
2025 |
CIDR |
6.6741456e-05 |
| 3,995 |
How Large Language Models Will Disrupt Data Management |
2023 |
VLDB |
6.5513237e-05 |
| 5,171 |
Abacus: A Cost-Based Optimizer for Semantic Operator Systems |
2026 |
VLDB |
5.6464993e-05 |
| 5,279 |
CDB: A Crowd-Powered Database System |
2018 |
VLDB |
5.5902418e-05 |
| 6,092 |
Observatory: Characterizing Embeddings of Relational Tables |
2024 |
VLDB |
5.2138566e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 8,204 |
ELEET: Efficient Learned Query Execution over Text and Tables |
2024 |
VLDB |
4.5594273e-05 |
| 9,219 |
Intelligent Agents for Data Exploration |
2024 |
VLDB |
4.3702863e-05 |
| 10,752 |
QUEST: Query Optimization in Unstructured Document Analysis |
2025 |
VLDB |
4.1945683e-05 |
| 7,705 |
AOP: Automated and Interactive LLM Pipeline Orchestration for Answering Complex Queries |
2025 |
CIDR |
4.6730494e-05 |
| 10,316 |
LLM-AutoDP: Automatic Data Processing via LLM Agents for Model Fine-tuning |
2026 |
VLDB |
4.1945683e-05 |
| 10,438 |
Doctopus: A System for Budget-aware Structural Data Extraction from Unstructured Documents |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,973 |
Unstructured Data Fusion for Schema and Data Extraction |
2024 |
SIGMOD |
4.1945683e-05 |
| 10,215 |
Task Cascades for Efficient Unstructured Data Processing |
2026 |
SIGMOD |
4.1945683e-05 |
| 9,152 |
Doctopus: Budget-aware Structural Table Extraction from Unstructured Documents |
2025 |
VLDB |
4.3849295e-05 |
| 13,134 |
DocDB: A Database for Unstructured Document Analysis |
2025 |
VLDB |
- |