Automatic Data Acquisition for Deep Learning
Summary: AutoData, an RL-guided system, automatically acquires training data from open ML benchmarks and data markets to support DL training. Its policy learns from AutoML feedback to guide high-quality data search; demonstrated on image classification and relational data prediction. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Jiabin Liu
- 2. Fu Zhu
- 3. Chengliang Chai
- 4. Yuyu Luo
- 5. Nan Tang
Incoming Citations (Sorted by Pagerank)
Showing 7 of 7 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 4,102 | GoodCore: Data-effective and Data-efficient Machine Learning through Coreset Selection over Incomplete Data | 2023 | SIGMOD | 6.4522929e-05 |
| 5,381 | Selective Data Acquisition in the Wild for Model Charging | 2022 | VLDB | 5.5399508e-05 |
| 5,976 | Responsible Data Integration: Next-generation Challenges | 2022 | SIGMOD | 5.245976e-05 |
| 7,179 | Coresets over Multiple Tables for Feature-rich and Data-efficient Machine Learning | 2023 | VLDB | 4.8078895e-05 |
| 8,268 | Learned Data-aware Image Representations of Line Charts for Similarity Search | 2023 | SIGMOD | 4.5456668e-05 |
| 8,281 | Optimizing Data Acquisition to Enhance Machine Learning Performance | 2024 | VLDB | 4.5435639e-05 |
| 10,316 | LLM-AutoDP: Automatic Data Processing via LLM Agents for Model Fine-tuning | 2026 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 6 of 6 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 254 | Snorkel: Rapid Training Data Creation with Weak Supervision | 2018 | VLDB | 0.00030540555 |
| 1,178 | Table Union Search on Open Data | 2018 | VLDB | 0.00013468118 |
| 1,751 | Auctus: A Dataset Search Engine for Data Discovery and Augmentation | 2021 | VLDB | 0.00010683295 |
| 4,825 | Synthesizing Natural Language to Visualization (NL2VIS) Benchmarks from NL2SQL Benchmarks | 2021 | SIGMOD | 5.8946721e-05 |
| 5,362 | Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach | 2016 | SIGMOD | 5.5473503e-05 |
| 7,575 | Human-in-the-loop Outlier Detection | 2020 | SIGMOD | 4.7068909e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,304 | A Scalable AutoML Approach Based on Graph Neural Networks | 2022 | VLDB | 5.5779335e-05 |
| 1,993 | Automatically Generating Data Exploration Sessions Using Deep Reinforcement Learning | 2020 | SIGMOD | 9.8453334e-05 |
| 4,540 | Automating Exploratory Data Analysis via Machine Learning: An Overview | 2020 | SIGMOD | 6.1033443e-05 |
| 1,463 | ARDA: Automatic Relational Data Augmentation for Machine Learning | 2020 | VLDB | 0.00011869295 |
| 8,281 | Optimizing Data Acquisition to Enhance Machine Learning Performance | 2024 | VLDB | 4.5435639e-05 |
| 9,219 | Intelligent Agents for Data Exploration | 2024 | VLDB | 4.3702863e-05 |
| 5,383 | Auto-Pipeline: Synthesizing Complex Data Pipelines By-Target Using Reinforcement Learning and Search | 2021 | VLDB | 5.5393038e-05 |
| 10,955 | Data Acquisition for Improving Model Confidence | 2024 | SIGMOD | 4.1945683e-05 |
| 3,750 | Data Acquisition for Improving Machine Learning Models | 2021 | VLDB | 6.7895763e-05 |
| 5,381 | Selective Data Acquisition in the Wild for Model Charging | 2022 | VLDB | 5.5399508e-05 |