Back to papers
Cut Costs, Not Accuracy: LLM-Powered Data Processing with Guarantees
Summary: Targets LLM cascade cost-quality tradeoffs by provable selection of when to use cheaper LLMs for record processing, addressing weak quality estimation in prior confidence-based cascades. BARGAIN uses adaptive sampling and statistical estimation tuned to data/task to give tight theoretical guarantees (accuracy/precision/recall) and empirically reduces cost up to 86% vs. state-of-the-art.
(summarized by gpt-5-mini on Feb 11 2026)
- Paper ID
- 7372
- Venue
- SIGMOD
- Year
- 2026
- Pagerank
- 4.1945683e-05
- Overall Rank
- 10,064 | 29.99%
- DOI
-
10.1145/3769776
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
Outgoing Citations (Sorted by Pagerank)
Showing 22 of 22 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 316 |
NoScope: Optimizing Neural Network Queries over Video at Scale |
2017 |
VLDB |
0.00027988668 |
| 329 |
Accelerating Machine Learning Inference with Probabilistic Predicates |
2018 |
SIGMOD |
0.00027249545 |
| 517 |
Can Foundation Models Wrangle Your Data? |
2023 |
VLDB |
0.00021169035 |
| 696 |
BlazeIt: Optimizing Declarative Aggregation and Limit Queries for Neural Network-Based Video Analytics |
2020 |
VLDB |
0.00018048935 |
| 1,082 |
CAESURA: Language Models as Multi-Modal Query Planners |
2024 |
CIDR |
0.00014214232 |
| 1,116 |
Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes |
2024 |
VLDB |
0.00013890154 |
| 1,963 |
DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing |
2025 |
VLDB |
9.929429e-05 |
| 3,558 |
Approximate Selection with Guarantees using Proxies |
2020 |
VLDB |
6.9765724e-05 |
| 3,876 |
The Design of an LLM-powered Unstructured Analytics System |
2025 |
CIDR |
6.6741456e-05 |
| 3,995 |
How Large Language Models Will Disrupt Data Management |
2023 |
VLDB |
6.5513237e-05 |
| 4,501 |
TASTI: Semantic Indexes for Machine Learning-based Queries over Unstructured Data |
2022 |
SIGMOD |
6.137686e-05 |
| 4,712 |
Accelerating Approximate Aggregation Queries with Expensive Predicates |
2021 |
VLDB |
5.9787986e-05 |
| 4,865 |
OTIF: Efficient Tracker Pre-processing over Large Video Datasets |
2022 |
SIGMOD |
5.8684353e-05 |
| 5,173 |
FiGO: Fine-Grained Query Optimization in Video Analytics |
2022 |
SIGMOD |
5.6447253e-05 |
| 5,462 |
RetClean: Retrieval-Based Data Cleaning Using LLMs and Data Lakes |
2024 |
VLDB |
5.494769e-05 |
| 7,705 |
AOP: Automated and Interactive LLM Pipeline Orchestration for Answering Complex Queries |
2025 |
CIDR |
4.6730494e-05 |
| 7,868 |
Solo: Data Discovery Using Natural Language Questions Via A Self-Supervised Approach |
2023 |
SIGMOD |
4.6319504e-05 |
| 7,928 |
Accelerating Aggregation Queries on Unstructured Streams of Data |
2023 |
VLDB |
4.613455e-05 |
| 8,204 |
ELEET: Efficient Learned Query Execution over Text and Tables |
2024 |
VLDB |
4.5594273e-05 |
| 9,235 |
ThriftLLM: On Cost-Effective Selection of Large Language Models for Classification Queries |
2025 |
VLDB |
4.3690661e-05 |
| 9,351 |
On Efficient Approximate Queries over Machine Learning Models |
2023 |
VLDB |
4.3524472e-05 |
| 9,729 |
Semantic Integrity Constraints: Declarative Guardrails for AI-Augmented Data Processing Systems |
2025 |
VLDB |
4.2942813e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 13,138 |
Database Perspective on LLM Inference Systems |
2025 |
VLDB |
- |
| 1,116 |
Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes |
2024 |
VLDB |
0.00013890154 |
| 3,840 |
Revisiting Prompt Engineering via Declarative Crowdsourcing |
2024 |
CIDR |
6.7106924e-05 |
| 10,316 |
LLM-AutoDP: Automatic Data Processing via LLM Agents for Model Fine-tuning |
2026 |
VLDB |
4.1945683e-05 |
| 7,020 |
LLM for Data Management |
2024 |
VLDB |
4.8595728e-05 |
| 10,452 |
ScaleLLM: A Technique for Scalable LLM-augmented Data Systems |
2025 |
SIGMOD |
4.1945683e-05 |
| 7,339 |
SpareLLM: Automatically Selecting Task-Specific Minimum-Cost Large Language Models under Equivalence Constraint |
2025 |
SIGMOD |
4.7579469e-05 |
| 10,215 |
Task Cascades for Efficient Unstructured Data Processing |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,595 |
Optimized Batch Prompting for Cost-effective LLMs |
2025 |
VLDB |
4.1945683e-05 |
| 9,235 |
ThriftLLM: On Cost-Effective Selection of Large Language Models for Classification Queries |
2025 |
VLDB |
4.3690661e-05 |