Provenance-based Dictionary Refinement in Information Extraction
Summary: Provenance of extraction outputs drives dictionary refinement, formulating an optimization to maximize quality by pruning entries. Efficient algorithms with a probabilistic model for incomplete labeling are proposed and validated on real extractors, with implications for view maintenance in relational settings. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Authors
- 1. Sudeepa Roy
- 2. Laura Chiticariu
- 3. Vitaly Feldman
- 4. Frederick R. Reiss
- 5. Huaiyu Zhu
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 10 of 10 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 31 | Provenance Semirings | 2007 | PODS | 0.0007857786 |
| 287 | Declarative Information Extraction Using Datalog with Embedded Extraction Predicates | 2007 | VLDB | 0.00028971272 |
| 655 | On Propagation of Deletions and Annotations Through Views | 2002 | PODS | 0.00018608845 |
| 1,317 | Harvesting Relational Tables from Lists on the Web | 2009 | VLDB | 0.00012625853 |
| 2,602 | Tracing Data Errors with View-Conditioned Causality | 2011 | SIGMOD | 8.4667197e-05 |
| 2,984 | Efficiently Incorporating User Feedback into Information Extraction and Integration Programs | 2009 | SIGMOD | 7.7796344e-05 |
| 3,314 | Computing Query Probability with Incidence Algebras | 2010 | PODS | 7.2318581e-05 |
| 3,477 | Toward Best-Effort Information Extraction | 2008 | SIGMOD | 7.0583481e-05 |
| 4,971 | Maximizing Conjunctive Views in Deletion Propagation | 2011 | PODS | 5.7938195e-05 |
| 6,534 | Automatic Rule Refinement for Information Extraction | 2010 | VLDB | 5.0244622e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 6,186 | On Provenance Minimization | 2011 | PODS | 5.166082e-05 |
| 4,851 | Provenance for Natural Language Queries | 2017 | VLDB | 5.8768322e-05 |
| 760 | Creating Probabilistic Databases from Information Extraction Models | 2006 | VLDB | 0.00017053935 |
| 8,148 | When Speed Has a Price: Fast Information Extraction Using Approximate Algorithms | 2013 | VLDB | 4.5754467e-05 |
| 11,471 | On Optimizing the Trade-off between Privacy and Utility in Data Provenance | 2021 | SIGMOD | 4.1945683e-05 |
| 3,578 | Efficient Approximate Entity Extraction with Edit Distance Constraints | 2009 | SIGMOD | 6.9503858e-05 |
| 4,983 | Querying Probabilistic Information Extraction | 2010 | VLDB | 5.7870787e-05 |
| 652 | On the Provenance of Non-Answers to Queries over Extracted Data | 2008 | VLDB | 0.00018634477 |
| 11,240 | Autonomously Computable Information Extraction | 2023 | VLDB | 4.1945683e-05 |
| 6,534 | Automatic Rule Refinement for Information Extraction | 2010 | VLDB | 5.0244622e-05 |