Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms
Summary: Data valuation via Shapley values for ML training; exact Shapley for unweighted KNN computed in O(N log N), a major leap from 2^N. LSH-based epsilon-delta approximations yield sublinear O(N h(epsilon,K) log N); extensions include weighted KNN, multi-curator data, and Monte Carlo with O(N (log N)^2/(log K)^2), tested up to 10M points. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Ruoxi Jia
- 2. David Dao
- 3. Boxin Wang
- 4. Frances Ann Hubis
- 5. Nezihe Merve Gurel
- 6. Bo Li
- 7. Ce Zhang
- 8. Costas Spanos
- 9. Dawn Song
Incoming Citations (Sorted by Pagerank)
Showing 23 of 23 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 8 of 8 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 34 | Similarity Search in High Dimensions via Hashing | 1999 | VLDB | 0.00076637636 |
| 1,771 | On Arbitrage-free Pricing for General Data Queries | 2014 | VLDB | 0.00010617356 |
| 2,743 | Toward Practical Query Pricing with QueryMarket | 2013 | SIGMOD | 8.1897331e-05 |
| 2,820 | Price-Optimal Querying with Data APIs | 2016 | VLDB | 8.062913e-05 |
| 4,477 | How to Price Shared Optimizations in the Cloud | 2012 | VLDB | 6.1509882e-05 |
| 5,800 | QueryMarket Demonstration: Pricing for Online Data Markets | 2012 | VLDB | 5.3211601e-05 |
| 6,344 | QIRANA Demonstration: Real Time Scalable Query Pricing | 2017 | VLDB | 5.1023673e-05 |
| 7,044 | A Demonstration of Sterling: A Privacy-Preserving Data Marketplace | 2018 | VLDB | 4.8529797e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 7,380 | Efficient Sampling Approaches to Shapley Value Approximation | 2023 | SIGMOD | 4.746272e-05 |
| 5,224 | Neighbor-Sensitive Hashing | 2016 | VLDB | 5.6197981e-05 |
| 2,868 | Computing the Shapley Value of Facts in Query Answering | 2022 | SIGMOD | 7.9816425e-05 |
| 7,321 | Counterfactual Explanation of Shapley Value in Data Coalitions | 2024 | VLDB | 4.7629325e-05 |
| 10,655 | A Comprehensive Study of Shapley Value in Data Analytics | 2025 | VLDB | 4.1945683e-05 |
| 6,262 | Fast Shapley Value Computation in Data Assemblage Tasks as Cooperative Simple Games | 2024 | SIGMOD | 5.1349507e-05 |
| 7,932 | P-Shapley: Shapley Values on Probabilistic Classifiers | 2024 | VLDB | 4.613363e-05 |
| 10,524 | Understanding the Black Box: A Deep Empirical Dive into Shapley Value Approximations for Tabular Data | 2025 | SIGMOD | 4.1945683e-05 |
| 6,723 | On Shapley Value in Data Assemblage Under Independent Utility | 2022 | VLDB | 4.9490816e-05 |
| 6,263 | Equitable Data Valuation Meets the Right to Be Forgotten in Model Markets | 2023 | VLDB | 5.1349507e-05 |