Explore or Exploit? Effective Strategies for Disambiguating Large Databases
Summary: Disambiguation in large databases under limited cleaning budget, with uncertain candidate quality and success. The Explore-Exploit (EE) algorithm learns from ongoing cleaning to allocate budget, beating greedy baselines; robust to unknown cleaning probabilities, validated on real and synthetic data. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Reynold Cheng
- 2. Eric Lo
- 3. Xuan S. Yang
- 4. Ming-Hay Luk
- 5. Xiang Li
- 6. Xike Xie
Incoming Citations (Sorted by Pagerank)
Showing 3 of 3 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 2,722 | Progressive Approach to Relational Entity Resolution | 2014 | VLDB | 8.2338356e-05 |
| 7,185 | Certus: An Effective Entity Resolution Approach with Graph Differential Dependencies (GDDs) | 2019 | VLDB | 4.8066159e-05 |
| 9,043 | Query-Guided Resolution in Uncertain Databases | 2023 | SIGMOD | 4.4039656e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 9 of 9 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 74 | Efficient Query Evaluation on Probabilistic Databases | 2004 | VLDB | 0.00057857292 |
| 101 | ULDBs: Databases with Uncertainty and Lineage | 2006 | VLDB | 0.0004955674 |
| 382 | COMA - A system for flexible combination of schema matching approaches | 2002 | VLDB | 0.00024823252 |
| 467 | Evaluating Probabilistic Queries over Imprecise Data | 2003 | SIGMOD | 0.00022443768 |
| 477 | Model-Driven Data Acquisition in Sensor Networks | 2004 | VLDB | 0.00022221803 |
| 1,003 | Adaptive Filters for Continuous Queries over Distributed Data Streams | 2003 | SIGMOD | 0.00014698435 |
| 1,179 | Probabilistic Skylines on Uncertain Data | 2007 | VLDB | 0.00013457451 |
| 3,360 | Modeling and Querying Possible Repairs in Duplicate Detection | 2009 | VLDB | 7.1742067e-05 |
| 5,537 | Cleaning Uncertain Data with Quality Guarantees | 2008 | VLDB | 5.4522327e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3,360 | Modeling and Querying Possible Repairs in Duplicate Detection | 2009 | VLDB | 7.1742067e-05 |
| 4,758 | Optimization for Active Learning-based Interactive Database Exploration | 2019 | VLDB | 5.9422515e-05 |
| 467 | Evaluating Probabilistic Queries over Imprecise Data | 2003 | SIGMOD | 0.00022443768 |
| 5,398 | Cleaning Inconsistencies in Information Extraction via Prioritized Repairs | 2014 | PODS | 5.5295577e-05 |
| 1,542 | Efficient Search for the Top-k Probable Nearest Neighbors in Uncertain Databases | 2008 | VLDB | 0.00011456321 |
| 12,835 | Incomplete Path Expressions and their Disambiguation | 1994 | SIGMOD | 4.1945683e-05 |
| 2,797 | Query-Oriented Data Cleaning with Oracles | 2015 | SIGMOD | 8.1108589e-05 |
| 7,702 | Counting and Enumerating (Preferred) Database Repairs | 2017 | PODS | 4.6736471e-05 |
| 9,043 | Query-Guided Resolution in Uncertain Databases | 2023 | SIGMOD | 4.4039656e-05 |
| 5,537 | Cleaning Uncertain Data with Quality Guarantees | 2008 | VLDB | 5.4522327e-05 |