Database Paper Browser

Back to papers

Cut Costs, Not Accuracy: LLM-Powered Data Processing with Guarantees

Summary: Targets LLM cascade cost-quality tradeoffs by provable selection of when to use cheaper LLMs for record processing, addressing weak quality estimation in prior confidence-based cascades. BARGAIN uses adaptive sampling and statistical estimation tuned to data/task to give tight theoretical guarantees (accuracy/precision/recall) and empirically reduces cost up to 86% vs. state-of-the-art. (summarized by gpt-5-mini on Feb 11 2026)

Paper ID
7372
Venue
SIGMOD
Year
2026
Pagerank
4.1945683e-05
Overall Rank
10,064 | 29.99%
DOI
10.1145/3769776

Incoming Non-self Citations Over Time

No non-self incoming citations found for this paper in this database.

Authors

Incoming Citations (Sorted by Pagerank)

Showing 0 of 0 citing papers.

Rank Citing Paper Year Venue Pagerank
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 22 of 22 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
316 NoScope: Optimizing Neural Network Queries over Video at Scale 2017 VLDB 0.00027988668
329 Accelerating Machine Learning Inference with Probabilistic Predicates 2018 SIGMOD 0.00027249545
517 Can Foundation Models Wrangle Your Data? 2023 VLDB 0.00021169035
696 BlazeIt: Optimizing Declarative Aggregation and Limit Queries for Neural Network-Based Video Analytics 2020 VLDB 0.00018048935
1,082 CAESURA: Language Models as Multi-Modal Query Planners 2024 CIDR 0.00014214232
1,116 Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes 2024 VLDB 0.00013890154
1,963 DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing 2025 VLDB 9.929429e-05
3,558 Approximate Selection with Guarantees using Proxies 2020 VLDB 6.9765724e-05
3,876 The Design of an LLM-powered Unstructured Analytics System 2025 CIDR 6.6741456e-05
3,995 How Large Language Models Will Disrupt Data Management 2023 VLDB 6.5513237e-05
4,501 TASTI: Semantic Indexes for Machine Learning-based Queries over Unstructured Data 2022 SIGMOD 6.137686e-05
4,712 Accelerating Approximate Aggregation Queries with Expensive Predicates 2021 VLDB 5.9787986e-05
4,865 OTIF: Efficient Tracker Pre-processing over Large Video Datasets 2022 SIGMOD 5.8684353e-05
5,173 FiGO: Fine-Grained Query Optimization in Video Analytics 2022 SIGMOD 5.6447253e-05
5,462 RetClean: Retrieval-Based Data Cleaning Using LLMs and Data Lakes 2024 VLDB 5.494769e-05
7,705 AOP: Automated and Interactive LLM Pipeline Orchestration for Answering Complex Queries 2025 CIDR 4.6730494e-05
7,868 Solo: Data Discovery Using Natural Language Questions Via A Self-Supervised Approach 2023 SIGMOD 4.6319504e-05
7,928 Accelerating Aggregation Queries on Unstructured Streams of Data 2023 VLDB 4.613455e-05
8,204 ELEET: Efficient Learned Query Execution over Text and Tables 2024 VLDB 4.5594273e-05
9,235 ThriftLLM: On Cost-Effective Selection of Large Language Models for Classification Queries 2025 VLDB 4.3690661e-05
9,351 On Efficient Approximate Queries over Machine Learning Models 2023 VLDB 4.3524472e-05
9,729 Semantic Integrity Constraints: Declarative Guardrails for AI-Augmented Data Processing Systems 2025 VLDB 4.2942813e-05
Previous Page 1 / 1 Next

Semantically Similar Papers