Database Paper Browser

Back to papers

RetClean: Retrieval-Based Data Cleaning Using LLMs and Data Lakes

Summary: RetClean uses retrieval-augmented LLMs over indexed data lakes to produce provenance-aware imputations by retrieving top-k relevant tuples for each dirty record. Supports three modes—cloud LLMs for world knowledge, RAG with data-lake evidence for enterprise tables, and fine-tuned local LLMs for privacy—exposed via a GUI to explore model/trade-off choices. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID
13673
Venue
VLDB
Year
2024
Pagerank
5.494769e-05
Overall Rank
5,462 | 62.01%
DOI
10.14778/3685800.3685890

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 3 of 3 citing papers.

Rank Citing Paper Year Venue Pagerank
7,931 In-depth Analysis of Graph-based RAG in a Unified Framework 2025 VLDB 4.613363e-05
10,064 Cut Costs, Not Accuracy: LLM-Powered Data Processing with Guarantees 2026 SIGMOD 4.1945683e-05
10,828 Buckaroo: A Direct Manipulation Visual Data Wrangler 2025 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 2 of 2 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
517 Can Foundation Models Wrangle Your Data? 2023 VLDB 0.00021169035
1,541 Symphony: Towards Natural Language Query Answering over Multi-modal Data Lakes 2023 CIDR 0.00011456579
Previous Page 1 / 1 Next

Semantically Similar Papers