Data Acquisition for Improving Machine Learning Models
Summary: Formalizes a data-market framework for acquiring training data to boost ML accuracy, with buyer–provider dynamics. Proposes EA and SPS, strategies balancing exploration and exploitation to improve model accuracy; validated on real datasets. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Yifan Li
- 2. Xiaohui Yu
- 3. Nick Koudas
Incoming Citations (Sorted by Pagerank)
Showing 13 of 13 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 10 of 10 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 903 | To Join or Not to Join? Thinking Twice about Joins before Feature Selection | 2016 | SIGMOD | 0.0001547016 |
| 1,463 | ARDA: Automatic Relational Data Augmentation for Machine Learning | 2020 | VLDB | 0.00011869295 |
| 1,660 | Data Markets in the Cloud: An Opportunity for the Database Community | 2011 | VLDB | 0.00010979534 |
| 1,771 | On Arbitrage-free Pricing for General Data Queries | 2014 | VLDB | 0.00010617356 |
| 1,891 | Towards Model-based Pricing for Machine Learning in a Data Marketplace | 2019 | SIGMOD | 0.00010194092 |
| 2,359 | Data Market Platforms: Trading Data Assets to Solve Data Problems | 2020 | VLDB | 8.9607667e-05 |
| 3,142 | Active Learning for ML Enhanced Database Systems | 2020 | SIGMOD | 7.4815444e-05 |
| 3,954 | Efficiently Approximating Selectivity Functions using Low Overhead Regression Models | 2020 | VLDB | 6.5926838e-05 |
| 4,129 | Are Key-Foreign Key Joins Safe to Avoid when Learning High-Capacity Classifiers? | 2018 | VLDB | 6.428887e-05 |
| 4,279 | Revenue Maximization for Query Pricing | 2020 | VLDB | 6.2953388e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,024 | Towards Distribution-aware Query Answering in Data Markets | 2022 | VLDB | 5.7535043e-05 |
| 11,003 | Performance-Based Pricing for Federated Learning via Auction | 2024 | VLDB | 4.1945683e-05 |
| 9,219 | Intelligent Agents for Data Exploration | 2024 | VLDB | 4.3702863e-05 |
| 9,351 | On Efficient Approximate Queries over Machine Learning Models | 2023 | VLDB | 4.3524472e-05 |
| 8,989 | Stochastic Data Acquisition for Answering Queries as Time Goes by | 2017 | VLDB | 4.413361e-05 |
| 1,891 | Towards Model-based Pricing for Machine Learning in a Data Marketplace | 2019 | SIGMOD | 0.00010194092 |
| 5,963 | Automatic Data Acquisition for Deep Learning | 2021 | VLDB | 5.2526794e-05 |
| 5,381 | Selective Data Acquisition in the Wild for Model Charging | 2022 | VLDB | 5.5399508e-05 |
| 8,281 | Optimizing Data Acquisition to Enhance Machine Learning Performance | 2024 | VLDB | 4.5435639e-05 |
| 10,955 | Data Acquisition for Improving Model Confidence | 2024 | SIGMOD | 4.1945683e-05 |