DocDB: A Database for Unstructured Document Analysis

Summary: DocDB targets LLM extraction bottlenecks with a two-level index that retrieves only relevant text segments, reducing costly attribute extractions. It adds adaptive per-document planning to minimize LLM invocations and enable low-cost SQL-like analysis of unstructured documents. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID: 14164
Venue: VLDB
Year: 2025
Pagerank: -
Overall Rank: 13,148 | 8.63%
DOI: 10.14778/3750601.3750678

Incoming Non-self Citations Over Time

No non-self incoming citations found for this paper in this database.

Authors

Incoming Citations (Sorted by Pagerank)

Showing 0 of 0 citing papers.

Rank	Citing Paper	Year	Venue	Pagerank

Outgoing Citations (Sorted by Pagerank)

Showing 1 of 1 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank	Cited Paper	Year	Venue	Pagerank
2,013	Palimpzest: Optimizing AI-Powered Analytics with Declarative Query Processing	2025	CIDR	9.7986166e-05

Semantically Similar Papers

Overall Rank	Paper	Year	Venue	Pagerank
11,429	Accelerating Queries over Unstructured Data with ML	2021	CIDR	4.1905499e-05
5,669	Databases Unbound: Querying All of the World's Bytes with AI	2024	VLDB	5.3805024e-05
6,202	Chat2Data: An Interactive Data Analysis System with RAG, Vector Databases and LLMs	2024	VLDB	5.1554849e-05
10,285	Relational Deep Dive: Error-Aware Queries Over Unstructured Data	2026	VLDB	4.1905499e-05
10,215	Task Cascades for Efficient Unstructured Data Processing	2026	SIGMOD	4.1905499e-05
10,976	Unstructured Data Fusion for Schema and Data Extraction	2024	SIGMOD	4.1905499e-05
1,839	DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing	2025	VLDB	0.00010351287
10,758	QUEST: Query Optimization in Unstructured Document Analysis	2025	VLDB	4.1905499e-05
9,152	Doctopus: Budget-aware Structural Table Extraction from Unstructured Documents	2025	VLDB	4.380727e-05
10,448	Doctopus: A System for Budget-aware Structural Data Extraction from Unstructured Documents	2025	SIGMOD	4.1905499e-05