Database Paper Browser

Back to papers

DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing

Summary: DocETL: declarative framework that automatically decomposes and rewrites LLM-based document processing pipelines via agentic "rewrite directives" and logical plan rewrites to mitigate LLM omissions. Agent-guided evaluation with latency-aware optimization yields 21–80% accuracy gains. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID
13940
Venue
VLDB
Year
2025
Pagerank
9.929429e-05
Overall Rank
1,963 | 86.35%
DOI
10.14778/3746405.3746426

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 20 of 20 citing papers.

Rank Citing Paper Year Venue Pagerank
3,876 The Design of an LLM-powered Unstructured Analytics System 2025 CIDR 6.6741456e-05
5,171 Abacus: A Cost-Based Optimizer for Semantic Operator Systems 2026 VLDB 5.6464993e-05
5,840 Logical and Physical Optimizations for SQL Query Execution over Large Language Models 2025 SIGMOD 5.3042561e-05
7,119 VectraFlow: Integrating Vectors into Stream Processing 2025 CIDR 4.8262611e-05
8,469 Semantic Operators and Their Optimization: Enabling LLM-Based Data Processing with Accuracy Guarantees in LOTUS 2025 VLDB 4.5041113e-05
9,370 PalimpChat: Declarative and Interactive AI analytics 2025 SIGMOD 4.3480692e-05
9,729 Semantic Integrity Constraints: Declarative Guardrails for AI-Augmented Data Processing Systems 2025 VLDB 4.2942813e-05
9,968 Please Don't Kill My Vibe: Empowering Agents with Data Flow Control 2026 CIDR 4.1945683e-05
9,972 KathDB: Explainable Multimodal Database Management System with Human-AI Collaboration 2026 CIDR 4.1945683e-05
9,985 Making Prompts First-Class Citizens for Adaptive LLM Pipelines 2026 CIDR 4.1945683e-05
9,990 Deep Research is the New Analytics System: Towards Building the Runtime for AI-Driven Analytics 2026 CIDR 4.1945683e-05
10,064 Cut Costs, Not Accuracy: LLM-Powered Data Processing with Guarantees 2026 SIGMOD 4.1945683e-05
10,069 Drama: Unifying Data Retrieval and Analysis for Open-Domain Analytic Queries 2026 SIGMOD 4.1945683e-05
10,117 AixelAsk: A Stepwise-Guided Retrieval and Reasoning Framework for Large Table QA 2026 SIGMOD 4.1945683e-05
10,126 Visual Template Inference for Data Extraction from Documents 2026 SIGMOD 4.1945683e-05
10,144 Beyond Relational: Semantic-Aware Multi-Modal Analytics with LLM-Native Query Optimization 2026 SIGMOD 4.1945683e-05
10,194 PRISM: Navigating Cost–Accuracy Trade-offs for NL2SQL 2026 SIGMOD 4.1945683e-05
10,215 Task Cascades for Efficient Unstructured Data Processing 2026 SIGMOD 4.1945683e-05
10,320 ELT-Bench: An End-to-End Benchmark for Evaluating AI Agents on ELT Pipelines 2026 VLDB 4.1945683e-05
10,325 KEN: An Execution Engine for Unstructured Database Systems 2026 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 18 of 18 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
94 CrowdDB: Answering Queries with Crowdsourcing 2011 SIGMOD 0.00051013264
95 Maintaining Views Incrementally 1993 SIGMOD 0.00050896659
249 Crowdsourced Databases: Query Processing with People 2011 CIDR 0.00030740523
316 NoScope: Optimizing Neural Network Queries over Video at Scale 2017 VLDB 0.00027988668
454 An Overview of Query Optimization in Relational Systems 1998 PODS 0.00022734812
1,082 CAESURA: Language Models as Multi-Modal Query Planners 2024 CIDR 0.00014214232
1,116 Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes 2024 VLDB 0.00013890154
1,407 DB-BERT: A Database Tuning Tool that "Reads the Manual" 2022 SIGMOD 0.00012146739
2,106 Palimpzest: Optimizing AI-Powered Analytics with Declarative Query Processing 2025 CIDR 9.5342543e-05
3,015 Chorus: Foundation Models for Unified Data Discovery and Exploration 2024 VLDB 7.7092391e-05
3,335 DeepJoin: Joinable Table Discovery with Pre-trained Language Models 2023 VLDB 7.2065006e-05
3,508 spade: Synthesizing Data Quality Assertions for Large Language Model Pipelines 2024 VLDB 7.0271496e-05
3,840 Revisiting Prompt Engineering via Declarative Crowdsourcing 2024 CIDR 6.7106924e-05
3,876 The Design of an LLM-powered Unstructured Analytics System 2025 CIDR 6.6741456e-05
3,995 How Large Language Models Will Disrupt Data Management 2023 VLDB 6.5513237e-05
5,171 Abacus: A Cost-Based Optimizer for Semantic Operator Systems 2026 VLDB 5.6464993e-05
5,279 CDB: A Crowd-Powered Database System 2018 VLDB 5.5902418e-05
6,092 Observatory: Characterizing Embeddings of Relational Tables 2024 VLDB 5.2138566e-05
Previous Page 1 / 1 Next

Semantically Similar Papers