Datamap-Driven Tabular Coreset Selection for Classifier Training
Summary: Datamap-driven algorithm that constructs user-sized tabular coresets from GBDT datamaps in minutes, producing models that match or exceed full-dataset performance. Also introduces a datamap-based inference-time enhancement with provable guarantees, plus explainability, coreset-size tuning, and robustness to frequent feature additions. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Authors
- 1. Aviv Hadar
- 2. Tova Milo
- 3. Kathy Razmadze
Incoming Citations (Sorted by Pagerank)
Showing 1 of 1 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 10,455 | Sentence to Model: Cost-Effective Data Collection LLM Agent | 2025 | SIGMOD | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 6 of 6 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,204 | VerdictDB: Universalizing Approximate Query Processing | 2018 | SIGMOD | 0.00013319541 |
| 1,260 | Dynamic Sample Selection for Approximate Query Processing | 2003 | SIGMOD | 0.00012993347 |
| 1,667 | Structured Search Result Differentiation | 2009 | VLDB | 0.00010960247 |
| 2,501 | DBEst: Revisiting Approximate Query Processing Engines with Machine Learning Models | 2019 | SIGMOD | 8.6453446e-05 |
| 7,179 | Coresets over Multiple Tables for Feature-rich and Data-efficient Machine Learning | 2023 | VLDB | 4.8078895e-05 |
| 7,494 | SubStrat: A Subset-Based Optimization Strategy for Faster AutoML | 2023 | VLDB | 4.7180617e-05 |
Previous
Page 1 / 1
Next