Back to papers
Selective Data Acquisition in the Wild for Model Charging
Summary: AutoData enables end-to-end selective labeled-data acquisition from heterogeneous real-world sources for model charging. It first discovers relevant datasets; then cross-source data are clustered, and a bandit/DRL-driven sampler iteratively selects clusters, samples points, and updates rewards to optimize utility.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 12653
- Venue
- VLDB
- Year
- 2022
- Pagerank
- 5.5399508e-05
- Overall Rank
- 5,381 | 62.57%
- DOI
-
10.14778/3523210.3523223
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 17 of 17 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 3,727 |
Cost-based or Learning-based? A Hybrid Query Optimizer for Query Plan Selection |
2022 |
VLDB |
6.8141709e-05 |
| 3,970 |
HAIChart: Human and AI Paired Visualization System |
2024 |
VLDB |
6.5784767e-05 |
| 4,102 |
GoodCore: Data-effective and Data-efficient Machine Learning through Coreset Selection over Incomplete Data |
2023 |
SIGMOD |
6.4522929e-05 |
| 5,371 |
LearnedSQLGen: Constraint-aware SQL Generation using Reinforcement Learning |
2022 |
SIGMOD |
5.5428776e-05 |
| 7,179 |
Coresets over Multiple Tables for Feature-rich and Data-efficient Machine Learning |
2023 |
VLDB |
4.8078895e-05 |
| 7,582 |
LakeCompass: An End-to-End System for Data Maintenance, Search and Analysis in Data Lakes |
2024 |
VLDB |
4.7046388e-05 |
| 8,116 |
LakeBench: A Benchmark for Discovering Joinable and Unionable Tables in Data Lakes |
2024 |
VLDB |
4.581507e-05 |
| 8,268 |
Learned Data-aware Image Representations of Line Charts for Similarity Search |
2023 |
SIGMOD |
4.5456668e-05 |
| 8,281 |
Optimizing Data Acquisition to Enhance Machine Learning Performance |
2024 |
VLDB |
4.5435639e-05 |
| 9,365 |
Falcon: Fair Active Learning using Multi-armed Bandits |
2024 |
VLDB |
4.3502315e-05 |
| 9,928 |
Fainder: A Fast and Accurate Index for Distribution-Aware Dataset Search |
2024 |
VLDB |
4.2511622e-05 |
| 10,100 |
AixelNet: A Pre-trained Model with Table-aware Adaptation for Structured Data Prediction |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,289 |
LEAD: Iterative Data Selection for Efficient LLM Instruction Tuning |
2026 |
VLDB |
4.1945683e-05 |
| 10,465 |
A Cost-Effective LLM-based Approach to Identify Wildlife Trafficking in Online Marketplaces |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,471 |
Approximating Opaque Top-k Queries |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,955 |
Data Acquisition for Improving Model Confidence |
2024 |
SIGMOD |
4.1945683e-05 |
| 11,000 |
MisDetect: Iterative Mislabel Detection using Early Loss |
2024 |
VLDB |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 13 of 13 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 254 |
Snorkel: Rapid Training Data Creation with Weak Supervision |
2018 |
VLDB |
0.00030540555 |
| 1,178 |
Table Union Search on Open Data |
2018 |
VLDB |
0.00013468118 |
| 1,463 |
ARDA: Automatic Relational Data Augmentation for Machine Learning |
2020 |
VLDB |
0.00011869295 |
| 3,358 |
Organizing Data Lakes for Navigation |
2020 |
SIGMOD |
7.1784949e-05 |
| 3,750 |
Data Acquisition for Improving Machine Learning Models |
2021 |
VLDB |
6.7895763e-05 |
| 4,749 |
Slice Tuner: A Selective Data Acquisition Framework for Accurate and Fair Machine Learning Models |
2021 |
SIGMOD |
5.9503689e-05 |
| 4,825 |
Synthesizing Natural Language to Visualization (NL2VIS) Benchmarks from NL2SQL Benchmarks |
2021 |
SIGMOD |
5.8946721e-05 |
| 5,279 |
CDB: A Crowd-Powered Database System |
2018 |
VLDB |
5.5902418e-05 |
| 5,362 |
Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach |
2016 |
SIGMOD |
5.5473503e-05 |
| 5,963 |
Automatic Data Acquisition for Deep Learning |
2021 |
VLDB |
5.2526794e-05 |
| 6,467 |
Tailoring Data Source Distributions for Fairness-aware Data Integration |
2021 |
VLDB |
5.0528156e-05 |
| 7,575 |
Human-in-the-loop Outlier Detection |
2020 |
SIGMOD |
4.7068909e-05 |
| 11,582 |
Interactively Discovering and Ranking Desired Tuples without Writing SQL Queries |
2020 |
SIGMOD |
4.1945683e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 3,142 |
Active Learning for ML Enhanced Database Systems |
2020 |
SIGMOD |
7.4815444e-05 |
| 608 |
DeepDB: Learn from Data, not from Queries! |
2020 |
VLDB |
0.00019235898 |
| 9,219 |
Intelligent Agents for Data Exploration |
2024 |
VLDB |
4.3702863e-05 |
| 8,989 |
Stochastic Data Acquisition for Answering Queries as Time Goes by |
2017 |
VLDB |
4.413361e-05 |
| 4,540 |
Automating Exploratory Data Analysis via Machine Learning: An Overview |
2020 |
SIGMOD |
6.1033443e-05 |
| 6,519 |
Expand your Training Limits! Generating Training Data for ML-based Data Management |
2021 |
SIGMOD |
5.0316686e-05 |
| 10,955 |
Data Acquisition for Improving Model Confidence |
2024 |
SIGMOD |
4.1945683e-05 |
| 8,281 |
Optimizing Data Acquisition to Enhance Machine Learning Performance |
2024 |
VLDB |
4.5435639e-05 |
| 3,750 |
Data Acquisition for Improving Machine Learning Models |
2021 |
VLDB |
6.7895763e-05 |
| 5,963 |
Automatic Data Acquisition for Deep Learning |
2021 |
VLDB |
5.2526794e-05 |