Optimizing Data Acquisition to Enhance Machine Learning Performance
Summary: Introduces IAS, an online clustering-based acquisition method that incrementally updates the target model (avoiding full retraining) and uses adaptive scores to balance exploration vs. exploitation when selecting clusters. Extends to IAS-AMS which picks adaptive mini-batches from multiple clusters to remove single-cluster bias; IAS gives best efficiency while IAS-AMS yields superior labeling effectiveness with runtime comparable to CTS. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Tingting Wang
- 2. Shixun Huang
- 3. Zhifeng Bao
- 4. J. Shane Culpepper
- 5. Volkan Dedeoglu
- 6. Reza Arablouei
Incoming Citations (Sorted by Pagerank)
Showing 2 of 2 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 10,392 | Shapley Value Estimation Based on Differential Matrix | 2025 | SIGMOD | 4.1945683e-05 |
| 10,465 | A Cost-Effective LLM-based Approach to Identify Wildlife Trafficking in Online Marketplaces | 2025 | SIGMOD | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 11 of 11 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,337 | Learned Index Benefits: Machine Learning Based Index Performance Estimation | 2022 | VLDB | 5.5635208e-05 |
| 13,184 | ML2DAC: Meta-learning to Democratize AutoML for Clustering Analyses | 2023 | SIGMOD | - |
| 4,749 | Slice Tuner: A Selective Data Acquisition Framework for Accurate and Fair Machine Learning Models | 2021 | SIGMOD | 5.9503689e-05 |
| 7,179 | Coresets over Multiple Tables for Feature-rich and Data-efficient Machine Learning | 2023 | VLDB | 4.8078895e-05 |
| 3,118 | Scaling Up Crowd-Sourcing to Very Large Datasets: A Case for Active Learning | 2015 | VLDB | 7.5379338e-05 |
| 3,142 | Active Learning for ML Enhanced Database Systems | 2020 | SIGMOD | 7.4815444e-05 |
| 5,963 | Automatic Data Acquisition for Deep Learning | 2021 | VLDB | 5.2526794e-05 |
| 5,381 | Selective Data Acquisition in the Wild for Model Charging | 2022 | VLDB | 5.5399508e-05 |
| 10,955 | Data Acquisition for Improving Model Confidence | 2024 | SIGMOD | 4.1945683e-05 |
| 3,750 | Data Acquisition for Improving Machine Learning Models | 2021 | VLDB | 6.7895763e-05 |