Computational Aspects of Resilient Data Extraction from Semistructured Sources
Summary: Formalizes resilient extraction using "unambiguous extraction expressions" (regular expressions with extra structure), defining resilience as producing maximal such expressions. Derives characterization theorems, complexity bounds for testing, and synthesis algorithms for maximal extractors. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Hasan Davulcu
- 2. Guizhen Yang
- 3. Michael Kifer
- 4. I.V. Ramakrishnan
Incoming Citations (Sorted by Pagerank)
Showing 1 of 1 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 2,698 | Visual Web Information Extraction with Lixto* | 2001 | VLDB | 8.2753317e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 5 of 5 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 114 | A Query Language and Optimization Techniques for Unstructured Data | 1996 | SIGMOD | 0.00046339735 |
| 385 | NoDoSE - A Tool for Semi-Automatically Extracting Structured and Semistructured Data from Text Documents. | 1998 | SIGMOD | 0.00024795739 |
| 1,314 | Semistructured Data | 1997 | PODS | 0.0001263326 |
| 1,919 | Cut and Paste | 1997 | PODS | 0.00010094755 |
| 3,150 | Template-Based Wrappers in the TSIMMIS System | 1997 | SIGMOD | 7.4736975e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 2,319 | Expressive and Flexible Access to Web-Extracted Data: A Keyword-based Structured Query Language | 2010 | SIGMOD | 9.0387108e-05 |
| 4,137 | Exploiting Content Redundancy for Web Information Extraction | 2010 | VLDB | 6.4181549e-05 |
| 2,362 | Understanding Web Query Interfaces: Best-Effort Parsing with Hidden Syntax | 2004 | SIGMOD | 8.9582251e-05 |
| 1,938 | Split-Correctness in Information Extraction | 2019 | PODS | 0.00010028895 |
| 4,440 | Robust Web Extraction: An Approach Based on a Probabilistic Tree-Edit Model | 2009 | SIGMOD | 6.187819e-05 |
| 587 | Extracting Structured Data from Web Pages | 2003 | SIGMOD | 0.00019648348 |
| 6,751 | Optimal Schemes for Robust Web Extraction | 2011 | VLDB | 4.939042e-05 |
| 3,285 | Using the Structure of Web Sites for Automatic Segmentation of Tables | 2004 | SIGMOD | 7.2759001e-05 |
| 11,240 | Autonomously Computable Information Extraction | 2023 | VLDB | 4.1945683e-05 |
| 7,826 | The Smallest Extraction Problem | 2021 | VLDB | 4.6416742e-05 |