Deduplicated Sampling On-Demand
Summary: Produce clean stratified samples from dirty datasets with duplicate entity representations, avoiding bias from multiple records per real-world entity. RadlER performs on-demand deduplication—cleaning only entities required for the sample—to match target group distributions far more efficiently than full-dedup-then-sample baselines. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Authors
- 1. Luca Zecchini
- 2. Vasilis Efthymiou
- 3. Felix Naumann
- 4. Giovanni Simonini
Incoming Citations (Sorted by Pagerank)
Showing 1 of 1 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 10,807 | RadlER: Deduplicated Sampling On-Demand | 2025 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 32 of 32 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 8,959 | Reservoir Sampling over Joins | 2024 | SIGMOD | 4.4206222e-05 |
| 8,910 | R2D2: Reducing Redundancy and Duplication in Data Lakes | 2023 | SIGMOD | 4.427232e-05 |
| 936 | Framework for Evaluating Clustering Algorithms in Duplicate Detection | 2009 | VLDB | 0.0001521549 |
| 4,435 | Sampling Dirty Data for Matching Attributes | 2010 | SIGMOD | 6.1918164e-05 |
| 3,360 | Modeling and Querying Possible Repairs in Duplicate Detection | 2009 | VLDB | 7.1742067e-05 |
| 9,461 | BrewER: Entity Resolution On-Demand | 2023 | VLDB | 4.3366491e-05 |
| 7,061 | Serving Deep Learning Models with Deduplication from Relational Databases | 2022 | VLDB | 4.8463881e-05 |
| 4,619 | Crowd-Based Deduplication: An Adaptive Approach | 2015 | SIGMOD | 6.0444854e-05 |
| 6,042 | MDedup: Duplicate Detection with Matching Dependencies | 2020 | VLDB | 5.2405269e-05 |
| 10,807 | RadlER: Deduplicated Sampling On-Demand | 2025 | VLDB | 4.1945683e-05 |