Automatic Rule Refinement for Information Extraction
Summary: Uses data-provenance lineage techniques to guide automatic refinement of rule-based information extraction. Given labeled correct/incorrect extractions, it produces a ranked list of rule modifications for expert refinement; implemented in SystemT and demonstrated effective improvement. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Bin Liu
- 2. Laura Chiticariu
- 3. Vivian Chu
- 4. H.V. Jagadish
- 5. Frederick R. Reiss
Incoming Citations (Sorted by Pagerank)
Showing 8 of 8 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 2,349 | RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation | 2021 | VLDB | 8.9876423e-05 |
| 4,971 | Maximizing Conjunctive Views in Deletion Propagation | 2011 | PODS | 5.7938195e-05 |
| 5,398 | Cleaning Inconsistencies in Information Extraction via Prioritized Repairs | 2014 | PODS | 5.5295577e-05 |
| 6,490 | Spanners: A Formal Framework for Information Extraction | 2013 | PODS | 5.0431719e-05 |
| 8,613 | Synthesizing Extraction Rules from User Examples with SEER | 2017 | SIGMOD | 4.4849545e-05 |
| 9,423 | Database Principles in Information Extraction | 2014 | PODS | 4.3441378e-05 |
| 12,052 | Provenance-based Dictionary Refinement in Information Extraction | 2013 | SIGMOD | 4.1945683e-05 |
| 13,491 | The SystemT IDE: An Integrated Development Environment for Information Extraction Rules | 2011 | SIGMOD | - |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 7 of 7 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 31 | Provenance Semirings | 2007 | PODS | 0.0007857786 |
| 287 | Declarative Information Extraction Using Datalog with Embedded Extraction Predicates | 2007 | VLDB | 0.00028971272 |
| 487 | Why Not? | 2009 | SIGMOD | 0.00022050218 |
| 652 | On the Provenance of Non-Answers to Queries over Extracted Data | 2008 | VLDB | 0.00018634477 |
| 2,562 | Explaining Missing Answers to SPJUA Queries | 2010 | VLDB | 8.5386194e-05 |
| 3,477 | Toward Best-Effort Information Extraction | 2008 | SIGMOD | 7.0583481e-05 |
| 7,280 | I4E: Interactive Investigation of Iterative Information Extraction | 2010 | SIGMOD | 4.778826e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 9,423 | Database Principles in Information Extraction | 2014 | PODS | 4.3441378e-05 |
| 7,280 | I4E: Interactive Investigation of Iterative Information Extraction | 2010 | SIGMOD | 4.778826e-05 |
| 7,800 | Data Management for Large Rule Systems | 1991 | VLDB | 4.6474123e-05 |
| 5,398 | Cleaning Inconsistencies in Information Extraction via Prioritized Repairs | 2014 | PODS | 5.5295577e-05 |
| 4,851 | Provenance for Natural Language Queries | 2017 | VLDB | 5.8768322e-05 |
| 652 | On the Provenance of Non-Answers to Queries over Extracted Data | 2008 | VLDB | 0.00018634477 |
| 4,156 | Uncertainty Management in Rule-Based Information Extraction Systems | 2009 | SIGMOD | 6.3999205e-05 |
| 11,240 | Autonomously Computable Information Extraction | 2023 | VLDB | 4.1945683e-05 |
| 12,052 | Provenance-based Dictionary Refinement in Information Extraction | 2013 | SIGMOD | 4.1945683e-05 |
| 13,491 | The SystemT IDE: An Integrated Development Environment for Information Extraction Rules | 2011 | SIGMOD | - |