Benchmarking Declarative Approximate Selection Predicates
Summary: Benchmarks declarative approximate selection predicates; introduces probabilistic similarity predicates for data quality using language models and HMMs with declarative realization. Classifies existing predicates by class and reports runtime and accuracy for data-cleaning tasks. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Amit Chandel
- 2. Oktie Hassanzadeh
- 3. Nick Koudas
- 4. Mohammad Sadoghi
- 5. Divesh Srivastava
Incoming Citations (Sorted by Pagerank)
Showing 12 of 12 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 9 of 9 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 125 | Approximate String Joins in a Database (Almost) for Free | 2001 | VLDB | 0.00044847972 |
| 150 | Integration of Heterogeneous Databases Without Common Domains Using Queries Based on Textual Similarity | 1998 | SIGMOD | 0.00041055843 |
| 155 | Robust and Efficient Fuzzy Match for Online Data Cleaning | 2003 | SIGMOD | 0.00040637896 |
| 199 | Declarative Data Cleaning: Language, Model, and Algorithms | 2001 | VLDB | 0.00035041015 |
| 250 | Efficient set joins on similarity predicates | 2004 | SIGMOD | 0.00030661988 |
| 266 | Efficient Exact Set-Similarity Joins | 2006 | VLDB | 0.00029718727 |
| 280 | Eliminating Fuzzy Duplicates in Data Warehouses | 2002 | VLDB | 0.00029113044 |
| 322 | Record Linkage: Similarity Measures and Algorithms | 2006 | SIGMOD | 0.00027518768 |
| 4,026 | Flexible String Matching Against Large Databases in Practice | 2004 | VLDB | 6.5169976e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3,954 | Efficiently Approximating Selectivity Functions using Low Overhead Regression Models | 2020 | VLDB | 6.5926838e-05 |
| 372 | Selectivity Estimation using Probabilistic Models | 2001 | SIGMOD | 0.00025354779 |
| 4,387 | Hybrid In-Database Inference for Declarative Information Extraction | 2011 | SIGMOD | 6.2320072e-05 |
| 6,739 | Benchmarking Approximate Consistent Query Answering | 2021 | PODS | 4.9449088e-05 |
| 3,529 | Merging the Results of Approximate Match Operations | 2004 | VLDB | 7.0059524e-05 |
| 329 | Accelerating Machine Learning Inference with Probabilistic Predicates | 2018 | SIGMOD | 0.00027249545 |
| 74 | Efficient Query Evaluation on Probabilistic Databases | 2004 | VLDB | 0.00057857292 |
| 2,364 | Deep Learning Models for Selectivity Estimation of Multi-Attribute Queries | 2020 | SIGMOD | 8.9554751e-05 |
| 9,351 | On Efficient Approximate Queries over Machine Learning Models | 2023 | VLDB | 4.3524472e-05 |
| 4,442 | Approximating Predicates and Expressive Queries on Probabilistic Databases | 2008 | PODS | 6.186154e-05 |