I4E: Interactive Investigation of Iterative Information Extraction
Summary: I4E enables interactive post-extraction investigation of IIE systems, formalizing explain, diagnose, and repair phases with algorithms. Evaluation on a 500M-document web corpus shows effective post-extraction reasoning and repair with strong user gains. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Anish Das Sarma
- 2. Alpa Jain
- 3. Divesh Srivastava
Incoming Citations (Sorted by Pagerank)
Showing 2 of 2 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3,105 | Data X-Ray: A Diagnostic Tool for Data Errors | 2015 | SIGMOD | 7.5568954e-05 |
| 6,534 | Automatic Rule Refinement for Information Extraction | 2010 | VLDB | 5.0244622e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 10 of 10 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 31 | Provenance Semirings | 2007 | PODS | 0.0007857786 |
| 101 | ULDBs: Databases with Uncertainty and Lineage | 2006 | VLDB | 0.0004955674 |
| 611 | Lineage Tracing for General Data Warehouse Transformations | 2001 | VLDB | 0.00019231115 |
| 652 | On the Provenance of Non-Answers to Queries over Extracted Data | 2008 | VLDB | 0.00018634477 |
| 759 | To Search or to Crawl? Towards a Query Optimizer for Text-Centric Tasks | 2006 | SIGMOD | 0.00017064615 |
| 760 | Creating Probabilistic Databases from Information Extraction Models | 2006 | VLDB | 0.00017053935 |
| 1,395 | Structured Querying of Web Text: A Technical Challenge | 2007 | CIDR | 0.00012207039 |
| 1,824 | DBNotes: A Post-It System for Relational Databases based on Provenance | 2005 | SIGMOD | 0.00010405194 |
| 1,970 | Approximate Lineage for Probabilistic Databases | 2008 | VLDB | 9.896375e-05 |
| 2,524 | Provenance Management in Curated Databases | 2006 | SIGMOD | 8.6017899e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 4,758 | Optimization for Active Learning-based Interactive Database Exploration | 2019 | VLDB | 5.9422515e-05 |
| 2,617 | Extraction and Integration of Partially Overlapping Web Sources | 2013 | VLDB | 8.4462621e-05 |
| 13,626 | Managing Information Extraction [Tutorial Outline] | 2006 | SIGMOD | - |
| 13,491 | The SystemT IDE: An Integrated Development Environment for Information Extraction Rules | 2011 | SIGMOD | - |
| 11,240 | Autonomously Computable Information Extraction | 2023 | VLDB | 4.1945683e-05 |
| 6,534 | Automatic Rule Refinement for Information Extraction | 2010 | VLDB | 5.0244622e-05 |
| 8,613 | Synthesizing Extraction Rules from User Examples with SEER | 2017 | SIGMOD | 4.4849545e-05 |
| 11,256 | Self-Training for Label-Efficient Information Extraction from Semi-Structured Web-Pages | 2023 | VLDB | 4.1945683e-05 |
| 4,983 | Querying Probabilistic Information Extraction | 2010 | VLDB | 5.7870787e-05 |
| 11,844 | Potential and Pitfalls of Domain-Specific Information Extraction at Web Scale | 2016 | SIGMOD | 4.1945683e-05 |