Doctopus: Budget-aware Structural Table Extraction from Unstructured Documents
Summary: Doctopus: a budget-aware system that mixes LLMs and cheaper non-LLM strategies for structural attribute extraction, using index-based chunk retrieval to minimize token costs. Per-attribute quality estimation and cost-constrained optimization select strategies; +11% quality at equal cost on a 4-dataset benchmark. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Chengliang Chai
- 2. Jiajun Li
- 3. Yuhao Deng
- 4. Yuanhao Zhong
- 5. Ye Yuan
- 6. Guoren Wang
- 7. Lei Cao
Incoming Citations (Sorted by Pagerank)
Showing 1 of 1 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 10,144 | Beyond Relational: Semantic-Aware Multi-Modal Analytics with LLM-Native Query Optimization | 2026 | SIGMOD | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 5 of 5 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 221 | Deep Entity Matching with Pre-Trained Language Models | 2021 | VLDB | 0.00033121824 |
| 1,116 | Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes | 2024 | VLDB | 0.00013890154 |
| 1,541 | Symphony: Towards Natural Language Query Answering over Multi-modal Data Lakes | 2023 | CIDR | 0.00011456579 |
| 2,057 | From Natural Language Processing to Neural Databases | 2021 | VLDB | 9.6624862e-05 |
| 5,214 | ThalamusDB: Approximate Query Processing on Multi-Modal Data | 2024 | SIGMOD | 5.624434e-05 |
Previous
Page 1 / 1
Next