Database Paper Browser

Back to papers

Deduplicated Sampling On-Demand

Summary: Produce clean stratified samples from dirty datasets with duplicate entity representations, avoiding bias from multiple records per real-world entity. RadlER performs on-demand deduplication—cleaning only entities required for the sample—to match target group distributions far more efficiently than full-dedup-then-sample baselines. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID
13894
Venue
VLDB
Year
2025
Pagerank
4.1945683e-05
Overall Rank
10,617 | 26.14%
DOI
10.14778/3742728.3742742

Incoming Non-self Citations Over Time

No non-self incoming citations found for this paper in this database.

Authors

Incoming Citations (Sorted by Pagerank)

Showing 1 of 1 citing papers.

Rank Citing Paper Year Venue Pagerank
10,807 RadlER: Deduplicated Sampling On-Demand 2025 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 32 of 32 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
221 Deep Entity Matching with Pre-Trained Language Models 2021 VLDB 0.00033121824
300 Deep Learning for Entity Matching: A Design Space Exploration 2018 SIGMOD 0.00028441466
398 Big Data Integration 2013 VLDB 0.00024372588
517 Can Foundation Models Wrangle Your Data? 2023 VLDB 0.00021169035
643 Corleone: Hands-Off Crowdsourcing for Entity Matching 2014 SIGMOD 0.00018754451
712 Magellan: Toward Building Entity Matching Management Systems 2016 VLDB 0.00017732426
754 Distributed Representations of Tuples for Entity Resolution 2018 VLDB 0.00017117211
791 ActiveClean: Interactive Data Cleaning For Statistical Modeling 2016 VLDB 0.00016629664
1,831 Synthesizing Entity Matching Rules by Examples 2018 VLDB 0.00010384082
2,175 Falcon: Scaling Up Hands-Off Crowdsourced Entity Matching to Build Cloud Services 2017 SIGMOD 9.3644117e-05
2,184 A Sample-and-Clean Framework for Fast and Accurate Query Processing on Dirty Data 2014 SIGMOD 9.3429789e-05
2,573 Query Optimization for Dynamic Imputation 2017 VLDB 8.518235e-05
2,767 A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching 2020 SIGMOD 8.1513883e-05
3,162 Looking for Trouble: Analyzing Classifier Behavior via Pattern Divergence 2021 SIGMOD 7.4589576e-05
3,640 Deep Learning for Blocking in Entity Matching: A Design Space Exploration 2021 VLDB 6.8891671e-05
4,018 Through the Fairness Lens: Experimental Analysis and Evaluation of Entity Matching 2023 VLDB 6.5244015e-05
4,104 Online Entity Resolution Using an Oracle 2016 VLDB 6.4493809e-05
4,273 Cleaning Denial Constraint Violations through Relaxation 2020 SIGMOD 6.3003864e-05
5,586 QuERy: A Framework for Integrating Entity Resolution with Query Processing 2016 VLDB 5.4219548e-05
6,042 MDedup: Duplicate Detection with Matching Dependencies 2020 VLDB 5.2405269e-05
6,175 Query-Driven Approach to Entity Resolution 2013 VLDB 5.169496e-05
6,467 Tailoring Data Source Distributions for Fairness-aware Data Integration 2021 VLDB 5.0528156e-05
6,643 Query Refinement for Diversity Constraint Satisfaction 2024 VLDB 4.9786132e-05
6,711 Analyzing How BERT Performs Entity Matching 2022 VLDB 4.9517546e-05
7,667 Fast Detection of Denial Constraint Violations 2022 VLDB 4.683767e-05
7,668 Human-in-the-loop Data Integration 2017 VLDB 4.6834075e-05
8,008 Entity Resolution On-Demand 2022 VLDB 4.6067684e-05
8,099 Sparkly: A Simple yet Surprisingly Strong TF/IDF Blocker for Entity Matching 2023 VLDB 4.5859317e-05
9,240 ZIP: Lazy Imputation during Query Processing 2024 VLDB 4.3690661e-05
9,461 BrewER: Entity Resolution On-Demand 2023 VLDB 4.3366491e-05
9,855 Progressive Entity Matching: A Design Space Exploration 2025 SIGMOD 4.269353e-05
9,856 In-Database Data Imputation 2024 SIGMOD 4.269353e-05
Previous Page 1 / 1 Next

Semantically Similar Papers

Overall Rank Paper Year Venue Pagerank
8,959 Reservoir Sampling over Joins 2024 SIGMOD 4.4206222e-05
8,910 R2D2: Reducing Redundancy and Duplication in Data Lakes 2023 SIGMOD 4.427232e-05
936 Framework for Evaluating Clustering Algorithms in Duplicate Detection 2009 VLDB 0.0001521549
4,435 Sampling Dirty Data for Matching Attributes 2010 SIGMOD 6.1918164e-05
3,360 Modeling and Querying Possible Repairs in Duplicate Detection 2009 VLDB 7.1742067e-05
9,461 BrewER: Entity Resolution On-Demand 2023 VLDB 4.3366491e-05
7,061 Serving Deep Learning Models with Deduplication from Relational Databases 2022 VLDB 4.8463881e-05
4,619 Crowd-Based Deduplication: An Adaptive Approach 2015 SIGMOD 6.0444854e-05
6,042 MDedup: Duplicate Detection with Matching Dependencies 2020 VLDB 5.2405269e-05
10,807 RadlER: Deduplicated Sampling On-Demand 2025 VLDB 4.1945683e-05