Optimizing Complex Extraction Programs over Evolving Text Data
Summary: Extends Cyclex from a single IE blackbox to compositional workflows for evolving text corpora. Models and recycles complex IE programs, searches plan spaces for optimal execution, and validates gains on real rule-based and learning-based pipelines. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Fei Chen
- 2. Byron J. Gao
- 3. AnHai Doan
- 4. Jun Yang
- 5. Raghu Ramakrishnan
Incoming Citations (Sorted by Pagerank)
Showing 2 of 2 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 2,984 | Efficiently Incorporating User Feedback into Information Extraction and Integration Programs | 2009 | SIGMOD | 7.7796344e-05 |
| 5,652 | From Information to Knowledge: Harvesting Entities and Relationships from Web Sources | 2010 | PODS | 5.3903671e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 7 of 7 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 287 | Declarative Information Extraction Using Datalog with Embedded Extraction Predicates | 2007 | VLDB | 0.00028971272 |
| 759 | To Search or to Crawl? Towards a Query Optimizer for Text-Centric Tasks | 2006 | SIGMOD | 0.00017064615 |
| 1,406 | Personal Information Management with Semex | 2005 | SIGMOD | 0.00012163944 |
| 1,722 | Building Structured Web Community Portals: A Top-Down, Compositional, and Incremental Approach | 2007 | VLDB | 0.00010757784 |
| 2,066 | DBLife: A Community Information Management Platform for the Database Research Community | 2007 | CIDR | 9.6399561e-05 |
| 2,771 | A Relational Approach to Incrementally Extracting and Querying Structure in Unstructured Data | 2007 | VLDB | 8.1421432e-05 |
| 6,073 | Impliance: A Next Generation Information Management Appliance | 2007 | CIDR | 5.2253416e-05 |
Previous
Page 1 / 1
Next