Back to papers
Task Cascades for Efficient Unstructured Data Processing
Summary: Task cascades generalize model cascades for LLM-based document processing by varying not only the model, but also the queried span and even the operation, exploiting simpler correlated sub-tasks and partial evidence. An iterative optimizer plus statistical accuracy guarantees yields 36% lower cost than standard cascades at 90% target accuracy.
(summarized by gpt-5.4-mini on Apr 11 2026)
- Paper ID
- 7527
- Venue
- SIGMOD
- Year
- 2026
- Pagerank
- 4.1945683e-05
- Overall Rank
- 10,215 | 28.94%
- DOI
-
10.1145/3786702
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
Outgoing Citations (Sorted by Pagerank)
Showing 29 of 29 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 139 |
Predicate Migration: Optimizing Queries with Expensive Predicates |
1993 |
SIGMOD |
0.00042299329 |
| 316 |
NoScope: Optimizing Neural Network Queries over Video at Scale |
2017 |
VLDB |
0.00027988668 |
| 329 |
Accelerating Machine Learning Inference with Probabilistic Predicates |
2018 |
SIGMOD |
0.00027249545 |
| 449 |
Approximate Query Processing: Taming the TeraBytes! A Tutorial |
2001 |
VLDB |
0.00022846068 |
| 530 |
Random Sampling for Histogram Construction: How much is enough? |
1998 |
SIGMOD |
0.00020803682 |
| 696 |
BlazeIt: Optimizing Declarative Aggregation and Limit Queries for Neural Network-Based Video Analytics |
2020 |
VLDB |
0.00018048935 |
| 1,082 |
CAESURA: Language Models as Multi-Modal Query Planners |
2024 |
CIDR |
0.00014214232 |
| 1,116 |
Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes |
2024 |
VLDB |
0.00013890154 |
| 1,574 |
Approximate Query Processing: No Silver Bullet |
2017 |
SIGMOD |
0.00011287495 |
| 1,963 |
DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing |
2025 |
VLDB |
9.929429e-05 |
| 2,106 |
Palimpzest: Optimizing AI-Powered Analytics with Declarative Query Processing |
2025 |
CIDR |
9.5342543e-05 |
| 3,472 |
LLM-R2: A Large Language Model Enhanced Rule-based Rewrite System for Boosting Query Efficiency |
2025 |
VLDB |
7.0639229e-05 |
| 3,558 |
Approximate Selection with Guarantees using Proxies |
2020 |
VLDB |
6.9765724e-05 |
| 3,876 |
The Design of an LLM-powered Unstructured Analytics System |
2025 |
CIDR |
6.6741456e-05 |
| 4,407 |
Filtering with Approximate Predicates |
1998 |
VLDB |
6.2133426e-05 |
| 4,501 |
TASTI: Semantic Indexes for Machine Learning-based Queries over Unstructured Data |
2022 |
SIGMOD |
6.137686e-05 |
| 4,712 |
Accelerating Approximate Aggregation Queries with Expensive Predicates |
2021 |
VLDB |
5.9787986e-05 |
| 5,171 |
Abacus: A Cost-Based Optimizer for Semantic Operator Systems |
2026 |
VLDB |
5.6464993e-05 |
| 5,173 |
FiGO: Fine-Grained Query Optimization in Video Analytics |
2022 |
SIGMOD |
5.6447253e-05 |
| 5,214 |
ThalamusDB: Approximate Query Processing on Multi-Modal Data |
2024 |
SIGMOD |
5.624434e-05 |
| 6,217 |
Pneuma: Leveraging LLMs for Tabular Data Representation and Retrieval in an End-to-End System |
2025 |
SIGMOD |
5.1534752e-05 |
| 7,119 |
VectraFlow: Integrating Vectors into Stream Processing |
2025 |
CIDR |
4.8262611e-05 |
| 7,339 |
SpareLLM: Automatically Selecting Task-Specific Minimum-Cost Large Language Models under Equivalence Constraint |
2025 |
SIGMOD |
4.7579469e-05 |
| 7,705 |
AOP: Automated and Interactive LLM Pipeline Orchestration for Answering Complex Queries |
2025 |
CIDR |
4.6730494e-05 |
| 7,928 |
Accelerating Aggregation Queries on Unstructured Streams of Data |
2023 |
VLDB |
4.613455e-05 |
| 8,204 |
ELEET: Efficient Learned Query Execution over Text and Tables |
2024 |
VLDB |
4.5594273e-05 |
| 8,469 |
Semantic Operators and Their Optimization: Enabling LLM-Based Data Processing with Accuracy Guarantees in LOTUS |
2025 |
VLDB |
4.5041113e-05 |
| 8,969 |
A Learned Query Rewrite System |
2023 |
VLDB |
4.4189226e-05 |
| 9,235 |
ThriftLLM: On Cost-Effective Selection of Large Language Models for Classification Queries |
2025 |
VLDB |
4.3690661e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 7,020 |
LLM for Data Management |
2024 |
VLDB |
4.8595728e-05 |
| 10,022 |
In-context Clustering-based Entity Resolution with Large Language Models: A Design Space Exploration |
2026 |
SIGMOD |
4.1945683e-05 |
| 1,116 |
Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes |
2024 |
VLDB |
0.00013890154 |
| 10,595 |
Optimized Batch Prompting for Cost-effective LLMs |
2025 |
VLDB |
4.1945683e-05 |
| 10,973 |
Unstructured Data Fusion for Schema and Data Extraction |
2024 |
SIGMOD |
4.1945683e-05 |
| 10,452 |
ScaleLLM: A Technique for Scalable LLM-augmented Data Systems |
2025 |
SIGMOD |
4.1945683e-05 |
| 13,134 |
DocDB: A Database for Unstructured Document Analysis |
2025 |
VLDB |
- |
| 1,963 |
DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing |
2025 |
VLDB |
9.929429e-05 |
| 10,752 |
QUEST: Query Optimization in Unstructured Document Analysis |
2025 |
VLDB |
4.1945683e-05 |
| 10,064 |
Cut Costs, Not Accuracy: LLM-Powered Data Processing with Guarantees |
2026 |
SIGMOD |
4.1945683e-05 |