Doctopus: A System for Budget-aware Structural Data Extraction from Unstructured Documents
Summary: Doctopus blends LLM-based attribute extraction with non-LLM methods under a budget-aware optimizer for structured data in unstructured documents. Chunking narrows to relevant content, estimates per-attr strategy quality, and budget-aware picks the optimal approach. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Authors
- 1. Yuanhao Zhong
- 2. Yuhao Deng
- 3. Chengliang Chai
- 4. Ruixin Gu
- 5. Ye Yuan
- 6. Guoren Wang
- 7. Lei Cao
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 2 of 2 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,116 | Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes | 2024 | VLDB | 0.00013890154 |
| 1,395 | Structured Querying of Web Text: A Technical Challenge | 2007 | CIDR | 0.00012207039 |
Previous
Page 1 / 1
Next