Database Paper Browser

Back to papers

Selective Data Acquisition in the Wild for Model Charging

Summary: AutoData enables end-to-end selective labeled-data acquisition from heterogeneous real-world sources for model charging. It first discovers relevant datasets; then cross-source data are clustered, and a bandit/DRL-driven sampler iteratively selects clusters, samples points, and updates rewards to optimize utility. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
12653
Venue
VLDB
Year
2022
Pagerank
5.5399508e-05
Overall Rank
5,381 | 62.57%
DOI
10.14778/3523210.3523223

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 17 of 17 citing papers.

Rank Citing Paper Year Venue Pagerank
3,727 Cost-based or Learning-based? A Hybrid Query Optimizer for Query Plan Selection 2022 VLDB 6.8141709e-05
3,970 HAIChart: Human and AI Paired Visualization System 2024 VLDB 6.5784767e-05
4,102 GoodCore: Data-effective and Data-efficient Machine Learning through Coreset Selection over Incomplete Data 2023 SIGMOD 6.4522929e-05
5,371 LearnedSQLGen: Constraint-aware SQL Generation using Reinforcement Learning 2022 SIGMOD 5.5428776e-05
7,179 Coresets over Multiple Tables for Feature-rich and Data-efficient Machine Learning 2023 VLDB 4.8078895e-05
7,582 LakeCompass: An End-to-End System for Data Maintenance, Search and Analysis in Data Lakes 2024 VLDB 4.7046388e-05
8,116 LakeBench: A Benchmark for Discovering Joinable and Unionable Tables in Data Lakes 2024 VLDB 4.581507e-05
8,268 Learned Data-aware Image Representations of Line Charts for Similarity Search 2023 SIGMOD 4.5456668e-05
8,281 Optimizing Data Acquisition to Enhance Machine Learning Performance 2024 VLDB 4.5435639e-05
9,365 Falcon: Fair Active Learning using Multi-armed Bandits 2024 VLDB 4.3502315e-05
9,928 Fainder: A Fast and Accurate Index for Distribution-Aware Dataset Search 2024 VLDB 4.2511622e-05
10,100 AixelNet: A Pre-trained Model with Table-aware Adaptation for Structured Data Prediction 2026 SIGMOD 4.1945683e-05
10,289 LEAD: Iterative Data Selection for Efficient LLM Instruction Tuning 2026 VLDB 4.1945683e-05
10,465 A Cost-Effective LLM-based Approach to Identify Wildlife Trafficking in Online Marketplaces 2025 SIGMOD 4.1945683e-05
10,471 Approximating Opaque Top-k Queries 2025 SIGMOD 4.1945683e-05
10,955 Data Acquisition for Improving Model Confidence 2024 SIGMOD 4.1945683e-05
11,000 MisDetect: Iterative Mislabel Detection using Early Loss 2024 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 13 of 13 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers