Task Cascades for Efficient Unstructured Data Processing

Summary: Task cascades generalize model cascades for LLM-based document processing by varying not only the model, but also the queried span and even the operation, exploiting simpler correlated sub-tasks and partial evidence. An iterative optimizer plus statistical accuracy guarantees yields 36% lower cost than standard cascades at 90% target accuracy. (summarized by gpt-5.4-mini on Apr 11 2026)

Paper ID: 7528
Venue: SIGMOD
Year: 2026
Pagerank: 4.1905499e-05
Overall Rank: 10,215 | 29.01%
DOI: 10.1145/3786702

Incoming Non-self Citations Over Time

No non-self incoming citations found for this paper in this database.

Authors

Incoming Citations (Sorted by Pagerank)

Showing 0 of 0 citing papers.

Rank	Citing Paper	Year	Venue	Pagerank

Outgoing Citations (Sorted by Pagerank)

Showing 29 of 29 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank	Cited Paper	Year	Venue	Pagerank
140	Predicate Migration: Optimizing Queries with Expensive Predicates	1993	SIGMOD	0.00042289025
317	NoScope: Optimizing Neural Network Queries over Video at Scale	2017	VLDB	0.0002798145
332	Accelerating Machine Learning Inference with Probabilistic Predicates	2018	SIGMOD	0.00027173479
430	Approximate Query Processing: Taming the TeraBytes! A Tutorial	2001	VLDB	0.00023406426
531	Random Sampling for Histogram Construction: How much is enough?	1998	SIGMOD	0.0002079072
694	BlazeIt: Optimizing Declarative Aggregation and Limit Queries for Neural Network-Based Video Analytics	2020	VLDB	0.00018031141
997	CAESURA: Language Models as Multi-Modal Query Planners	2024	CIDR	0.00014726927
1,088	Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes	2024	VLDB	0.00014158762
1,574	Approximate Query Processing: No Silver Bullet	2017	SIGMOD	0.00011289028
1,839	DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing	2025	VLDB	0.00010351287
2,013	Palimpzest: Optimizing AI-Powered Analytics with Declarative Query Processing	2025	CIDR	9.7986166e-05
3,465	LLM-R2: A Large Language Model Enhanced Rule-based Rewrite System for Boosting Query Efficiency	2025	VLDB	7.0668293e-05
3,553	Approximate Selection with Guarantees using Proxies	2020	VLDB	6.9763548e-05
3,639	The Design of an LLM-powered Unstructured Analytics System	2025	CIDR	6.8886648e-05
4,405	Filtering with Approximate Predicates	1998	VLDB	6.2048285e-05
4,492	TASTI: Semantic Indexes for Machine Learning-based Queries over Unstructured Data	2022	SIGMOD	6.1374891e-05
4,703	Accelerating Approximate Aggregation Queries with Expensive Predicates	2021	VLDB	5.9793615e-05
5,149	Abacus: A Cost-Based Optimizer for Semantic Operator Systems	2026	VLDB	5.655398e-05
5,168	FiGO: Fine-Grained Query Optimization in Video Analytics	2022	SIGMOD	5.6446115e-05
5,206	ThalamusDB: Approximate Query Processing on Multi-Modal Data	2024	SIGMOD	5.625641e-05
5,756	Pneuma: Leveraging LLMs for Tabular Data Representation and Retrieval in an End-to-End System	2025	SIGMOD	5.3387063e-05
7,115	VectraFlow: Integrating Vectors into Stream Processing	2025	CIDR	4.822227e-05
7,335	SpareLLM: Automatically Selecting Task-Specific Minimum-Cost Large Language Models under Equivalence Constraint	2025	SIGMOD	4.7533835e-05
7,369	ELEET: Efficient Learned Query Execution over Text and Tables	2024	VLDB	4.7452331e-05
7,703	AOP: Automated and Interactive LLM Pipeline Orchestration for Answering Complex Queries	2025	CIDR	4.668568e-05
7,911	Accelerating Aggregation Queries on Unstructured Streams of Data	2023	VLDB	4.6143141e-05
8,464	Semantic Operators and Their Optimization: Enabling LLM-Based Data Processing with Accuracy Guarantees in LOTUS	2025	VLDB	4.5003888e-05
8,975	A Learned Query Rewrite System	2023	VLDB	4.4146872e-05
9,240	ThriftLLM: On Cost-Effective Selection of Large Language Models for Classification Queries	2025	VLDB	4.3648789e-05

Semantically Similar Papers

Overall Rank	Paper	Year	Venue	Pagerank
7,016	LLM for Data Management	2024	VLDB	4.8561622e-05
10,022	In-context Clustering-based Entity Resolution with Large Language Models: A Design Space Exploration	2026	SIGMOD	4.1905499e-05
1,088	Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes	2024	VLDB	0.00014158762
10,603	Optimized Batch Prompting for Cost-effective LLMs	2025	VLDB	4.1905499e-05
10,976	Unstructured Data Fusion for Schema and Data Extraction	2024	SIGMOD	4.1905499e-05
10,462	ScaleLLM: A Technique for Scalable LLM-augmented Data Systems	2025	SIGMOD	4.1905499e-05
13,148	DocDB: A Database for Unstructured Document Analysis	2025	VLDB	-
1,839	DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing	2025	VLDB	0.00010351287
10,758	QUEST: Query Optimization in Unstructured Document Analysis	2025	VLDB	4.1905499e-05
10,064	Cut Costs, Not Accuracy: LLM-Powered Data Processing with Guarantees	2026	SIGMOD	4.1905499e-05