Database Paper Browser

Back to papers

GoodCore: Data-effective and Data-efficient Machine Learning through Coreset Selection over Incomplete Data

Summary: GoodCore selects a coreset for incomplete data by modeling missingness as repairs over worlds and optimizing the expected subset without cleaning. It proves NP-hard and offers an approximation with imputation-based variants, enabling data-efficient ML. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
6660
Venue
SIGMOD
Year
2023
Pagerank
6.4522929e-05
Overall Rank
4,102 | 71.47%
DOI
10.1145/3589302

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 12 of 12 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 16 of 16 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
49 Consistent Query Answers in Inconsistent Databases 1999 PODS 0.00067660624
656 ERACER: A Database Approach for Statistical Inference and Data Cleaning 2010 SIGMOD 0.00018588729
791 ActiveClean: Interactive Data Cleaning For Statistical Modeling 2016 VLDB 0.00016629664
2,302 Nearest Neighbor Classifiers over Incomplete Information: From Certain Answers to Certain Predictions 2021 VLDB 9.0668832e-05
2,566 Database Repairs and Consistent Query Answering: Origins and Further Developments 2019 PODS 8.5243847e-05
3,311 Efficient and Effective Data Imputation with Influence Functions 2022 VLDB 7.2406486e-05
4,825 Synthesizing Natural Language to Visualization (NL2VIS) Benchmarks from NL2SQL Benchmarks 2021 SIGMOD 5.8946721e-05
5,028 Adaptive Data Augmentation for Supervised Learning over Missing Data 2021 VLDB 5.7506746e-05
5,279 CDB: A Crowd-Powered Database System 2018 VLDB 5.5902418e-05
5,362 Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach 2016 SIGMOD 5.5473503e-05
5,381 Selective Data Acquisition in the Wild for Model Charging 2022 VLDB 5.5399508e-05
5,963 Automatic Data Acquisition for Deep Learning 2021 VLDB 5.2526794e-05
7,179 Coresets over Multiple Tables for Feature-rich and Data-efficient Machine Learning 2023 VLDB 4.8078895e-05
7,575 Human-in-the-loop Outlier Detection 2020 SIGMOD 4.7068909e-05
9,221 VisClean: Interactive Cleaning for Progressive Visualization 2020 VLDB 4.3699444e-05
11,582 Interactively Discovering and Ranking Desired Tuples without Writing SQL Queries 2020 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Semantically Similar Papers