LLM-AutoDP: Automatic Data Processing via LLM Agents for Model Fine-tuning
Summary: LLM-agent framework for privacy-preserving, automatic data processing in fine-tuning: iteratively synthesizes/refines DP pipelines from prompts/feedback, avoiding raw-data inspection. Key accelerators: distribution-preserving sampling, low-quality target selection, cache-and-reuse; 10x faster search. (summarized by gpt-5.4-mini on Apr 12 2026)
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Authors
- 1. Wei Huang
- 2. Anda Cheng
- 3. Yinggui Wang
- 4. Lei Wang
- 5. Tao Wei
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 3 of 3 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,429 | DiffPrep: Differentiable Data Preprocessing Pipeline Search for Learning over Tabular Data | 2023 | SIGMOD | 5.5087325e-05 |
| 5,921 | Data-Juicer: A One-Stop Data Processing System for Large Language Models | 2024 | SIGMOD | 5.2725159e-05 |
| 5,963 | Automatic Data Acquisition for Deep Learning | 2021 | VLDB | 5.2526794e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3,840 | Revisiting Prompt Engineering via Declarative Crowdsourcing | 2024 | CIDR | 6.7106924e-05 |
| 11,058 | LLM-PBE: Assessing Data Privacy in Large Language Models | 2024 | VLDB | 4.1945683e-05 |
| 5,921 | Data-Juicer: A One-Stop Data Processing System for Large Language Models | 2024 | SIGMOD | 5.2725159e-05 |
| 1,963 | DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing | 2025 | VLDB | 9.929429e-05 |
| 10,628 | CatDB: Data-catalog-guided, LLM-based Generation of Data-centric ML Pipelines | 2025 | VLDB | 4.1945683e-05 |
| 10,682 | AutoPrep: Natural Language Question-Aware Data Preparation with a Multi-Agent Framework | 2025 | VLDB | 4.1945683e-05 |
| 10,064 | Cut Costs, Not Accuracy: LLM-Powered Data Processing with Guarantees | 2026 | SIGMOD | 4.1945683e-05 |
| 13,098 | Demonstrating CatDB: LLM-based Generation of Data-centric ML Pipelines | 2025 | SIGMOD | - |
| 9,219 | Intelligent Agents for Data Exploration | 2024 | VLDB | 4.3702863e-05 |
| 7,020 | LLM for Data Management | 2024 | VLDB | 4.8595728e-05 |