Back to papers
Query-Oriented Data Cleaning with Oracles
Summary: QOCO: query-oriented data cleaning via edits to the DB driven by domain-expert oracle crowds, to improve query results. NP-hardness of minimizing interactions, heuristics, and a prototype with experiments on correcting incorrect and missing tuples.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 5000
- Venue
- SIGMOD
- Year
- 2015
- Pagerank
- 8.1108589e-05
- Overall Rank
- 2,797 | 80.55%
- DOI
-
10.1145/2723372.2737786
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 13 of 13 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 791 |
ActiveClean: Interactive Data Cleaning For Statistical Modeling |
2016 |
VLDB |
0.00016629664 |
| 1,627 |
Data Cleaning: Overview and Emerging Challenges |
2016 |
SIGMOD |
0.00011086905 |
| 3,299 |
SCODED: Statistical Constraint Oriented Data Error Detection |
2020 |
SIGMOD |
7.2546659e-05 |
| 3,773 |
Cleaning Crowdsourced Labels Using Oracles for Statistical Classification |
2019 |
VLDB |
6.7758649e-05 |
| 4,126 |
Waldo: An Adaptive Human Interface for Crowd Entity Resolution |
2017 |
SIGMOD |
6.4314729e-05 |
| 7,575 |
Human-in-the-loop Outlier Detection |
2020 |
SIGMOD |
4.7068909e-05 |
| 7,766 |
ICARUS: Minimizing Human Effort in Iterative Data Completion |
2018 |
VLDB |
4.6564959e-05 |
| 9,043 |
Query-Guided Resolution in Uncertain Databases |
2023 |
SIGMOD |
4.4039656e-05 |
| 9,196 |
QOCO: A Query Oriented Data Cleaning System with Oracles |
2015 |
VLDB |
4.3749064e-05 |
| 9,221 |
VisClean: Interactive Cleaning for Progressive Visualization |
2020 |
VLDB |
4.3699444e-05 |
| 11,178 |
LinCQA: Faster Consistent Query Answering with Linear Time Guarantees |
2023 |
SIGMOD |
4.1945683e-05 |
| 11,399 |
ActivePDB: Active Probabilistic Databases |
2022 |
VLDB |
4.1945683e-05 |
| 11,680 |
WiClean: A System for Fixing Wikipedia Interlinks Using Revision History Patterns |
2019 |
VLDB |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 28 of 28 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 31 |
Provenance Semirings |
2007 |
PODS |
0.0007857786 |
| 94 |
CrowdDB: Answering Queries with Crowdsourcing |
2011 |
SIGMOD |
0.00051013264 |
| 112 |
Potter's Wheel: An Interactive Data Cleaning System |
2001 |
VLDB |
0.00047045036 |
| 173 |
Schema Mapping as Query Discovery |
2000 |
VLDB |
0.00038627829 |
| 263 |
CrowdER: Crowdsourcing Entity Resolution |
2012 |
VLDB |
0.00029862413 |
| 294 |
Using Schema Matching to Simplify Heterogeneous Data Translation |
1998 |
VLDB |
0.00028669519 |
| 378 |
Towards Estimation Error Guarantees for Distinct Values |
2000 |
PODS |
0.0002497492 |
| 487 |
Why Not? |
2009 |
SIGMOD |
0.00022050218 |
| 652 |
On the Provenance of Non-Answers to Queries over Extracted Data |
2008 |
VLDB |
0.00018634477 |
| 655 |
On Propagation of Deletions and Annotations Through Views |
2002 |
PODS |
0.00018608845 |
| 767 |
Explaining differences in multidimensional aggregates |
1999 |
VLDB |
0.00016981309 |
| 809 |
Curated Databases |
2008 |
PODS |
0.00016430384 |
| 1,119 |
The Complexity of Causality and Responsibility for Query Answers and non-Answers |
2011 |
VLDB |
0.0001386199 |
| 1,125 |
How to ConQueR Why-Not Questions |
2010 |
SIGMOD |
0.00013845652 |
| 1,164 |
CrowdScreen: Algorithms for Filtering Data with Humans |
2012 |
SIGMOD |
0.00013564823 |
| 1,242 |
Question Selection for Crowd Entity Resolution |
2013 |
VLDB |
0.00013096655 |
| 1,699 |
Sensitivity Analysis and Explanations for Robust Query Evaluation in Probabilistic Databases |
2011 |
SIGMOD |
0.00010858983 |
| 2,184 |
A Sample-and-Clean Framework for Fast and Accurate Query Processing on Dirty Data |
2014 |
SIGMOD |
9.3429789e-05 |
| 2,334 |
Counting with the Crowd |
2013 |
VLDB |
9.0161817e-05 |
| 2,562 |
Explaining Missing Answers to SPJUA Queries |
2010 |
VLDB |
8.5386194e-05 |
| 2,722 |
Progressive Approach to Relational Entity Resolution |
2014 |
VLDB |
8.2338356e-05 |
| 2,790 |
Artemis: A System for Analyzing Missing Answers |
2009 |
VLDB |
8.1239026e-05 |
| 3,067 |
CrowdFill: Collecting Structured Data from the Crowd |
2014 |
SIGMOD |
7.6180371e-05 |
| 3,100 |
Crowd Mining |
2013 |
SIGMOD |
7.5634778e-05 |
| 4,416 |
CrowdMatcher: Crowd-Assisted Schema Matching |
2014 |
SIGMOD |
6.2039225e-05 |
| 4,479 |
Optimal Crowd-Powered Rating and Filtering Algorithms |
2014 |
VLDB |
6.149053e-05 |
| 4,971 |
Maximizing Conjunctive Views in Deletion Propagation |
2011 |
PODS |
5.7938195e-05 |
| 8,875 |
CerFix: A System for Cleaning Data with Certain Fixes |
2011 |
VLDB |
4.430475e-05 |
Semantically Similar Papers