Back to papers
Weak-to-Strong Prompts with Lightweight-to-Powerful LLMs for High-Accuracy, Low-Cost, and Explainable Data Transformation
Summary: MegaTran uses a fine-tuned lightweight LLM to turn loose user prompts into structured task descriptions (Weak2StrongPrompt), then a powerful LLM generates transformation code (Prompt2Code). Sanity-check reflection with checklists and LazyRAG retrieval of external snippets make results explainable, low-cost, and 2.2–26.1% more accurate than prior methods.
(summarized by gpt-5-mini on Feb 09 2026)
- Paper ID
- 13885
- Venue
- VLDB
- Year
- 2025
- Pagerank
- 4.1945683e-05
- Overall Rank
- 10,610 | 26.19%
- DOI
-
10.14778/3742728.3742734
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Incoming Citations (Sorted by Pagerank)
Showing 1 of 1 citing papers.
Outgoing Citations (Sorted by Pagerank)
Showing 20 of 20 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 112 |
Potter's Wheel: An Interactive Data Cleaning System |
2001 |
VLDB |
0.00047045036 |
| 517 |
Can Foundation Models Wrangle Your Data? |
2023 |
VLDB |
0.00021169035 |
| 1,012 |
NADEEF: A Commodity Data Cleaning System |
2013 |
SIGMOD |
0.0001464733 |
| 1,267 |
Foofah: Transforming Data By Example |
2017 |
SIGMOD |
0.00012936483 |
| 1,277 |
The Data Civilizer System |
2017 |
CIDR |
0.00012879695 |
| 1,469 |
BlinkFill: Semi-supervised Programming By Example for Syntactic String Transformations |
2016 |
VLDB |
0.00011836053 |
| 1,831 |
Synthesizing Entity Matching Rules by Examples |
2018 |
VLDB |
0.00010384082 |
| 2,349 |
RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation |
2021 |
VLDB |
8.9876423e-05 |
| 3,015 |
Chorus: Foundation Models for Unified Data Discovery and Exploration |
2024 |
VLDB |
7.7092391e-05 |
| 3,192 |
Towards Dependable Data Repairing with Fixing Rules |
2014 |
SIGMOD |
7.4095761e-05 |
| 3,478 |
Transform-Data-by-Example (TDE): An Extensible Search Engine for Data Transformations |
2018 |
VLDB |
7.054159e-05 |
| 4,212 |
Unicorn: A Unified Multi-tasking Model for Supporting Matching Tasks in Data Integration |
2023 |
SIGMOD |
6.3555142e-05 |
| 4,908 |
Combining Small Language Models and Large Language Models for Zero-Shot NL2SQL |
2024 |
VLDB |
5.8339245e-05 |
| 5,096 |
Auto-Transform: Learning-to-Transform by Patterns |
2020 |
VLDB |
5.7011825e-05 |
| 5,280 |
Explaining Dataset Changes for Semantic Data Versioning with Explain-Da-V |
2023 |
VLDB |
5.5896735e-05 |
| 5,981 |
DataPrep.EDA: Task-Centric Exploratory Data Analysis for Statistical Modeling in Python |
2021 |
SIGMOD |
5.2448986e-05 |
| 6,800 |
DTT: An Example-Driven Tabular Transformer for Joinability by Leveraging Large Language Models |
2024 |
SIGMOD |
4.9231471e-05 |
| 7,458 |
Trinity: An Extensible Synthesis Framework for Data Science |
2019 |
VLDB |
4.7245406e-05 |
| 8,000 |
Data Civilizer 2.0: A Holistic Framework for Data Preparation and Analytics |
2019 |
VLDB |
4.6092803e-05 |
| 9,577 |
CoClean: Collaborative Data Cleaning |
2020 |
SIGMOD |
4.3248438e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 10,595 |
Optimized Batch Prompting for Cost-effective LLMs |
2025 |
VLDB |
4.1945683e-05 |
| 6,800 |
DTT: An Example-Driven Tabular Transformer for Joinability by Leveraging Large Language Models |
2024 |
SIGMOD |
4.9231471e-05 |
| 6,389 |
Chat2Data: An Interactive Data Analysis System with RAG, Vector Databases and LLMs |
2024 |
VLDB |
5.0844009e-05 |
| 8,155 |
Automated Data Visualization from Natural Language via Large Language Models: An Exploratory Study |
2024 |
SIGMOD |
4.5745248e-05 |
| 10,064 |
Cut Costs, Not Accuracy: LLM-Powered Data Processing with Guarantees |
2026 |
SIGMOD |
4.1945683e-05 |
| 1,116 |
Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes |
2024 |
VLDB |
0.00013890154 |
| 10,316 |
LLM-AutoDP: Automatic Data Processing via LLM Agents for Model Fine-tuning |
2026 |
VLDB |
4.1945683e-05 |
| 3,840 |
Revisiting Prompt Engineering via Declarative Crowdsourcing |
2024 |
CIDR |
6.7106924e-05 |
| 9,399 |
TabulaX: Leveraging Large Language Models for Multi-Class Table Transformations |
2025 |
VLDB |
4.3441378e-05 |
| 7,020 |
LLM for Data Management |
2024 |
VLDB |
4.8595728e-05 |