Spanners: A Formal Framework for Information Extraction
Summary: Propose "spanners": functions mapping strings to span-relations, formalizing primitive representations (regex with capture variables and two automata models) and their algebraic closure. Prove one automaton equals regex captures, the other characterizes regular spanners; introduce core spanners with string-equality and show regular spanners closed under difference but core spanners are not. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Ronald Fagin
- 2. Benny Kimelfeld
- 3. Frederick Reiss
- 4. Stijn Vansummeren
Incoming Citations (Sorted by Pagerank)
Showing 5 of 5 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3,042 | Dichotomies in the Complexity of Preferred Repairs | 2015 | PODS | 7.669374e-05 |
| 5,398 | Cleaning Inconsistencies in Information Extraction via Prioritized Repairs | 2014 | PODS | 5.5295577e-05 |
| 6,347 | A Relational Framework for Classifier Engineering | 2017 | PODS | 5.1019568e-05 |
| 9,423 | Database Principles in Information Extraction | 2014 | PODS | 4.3441378e-05 |
| 12,062 | Next Generation Data Analytics at IBM Research | 2013 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 5 of 5 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 49 | Consistent Query Answers in Inconsistent Databases | 1999 | PODS | 0.00067660624 |
| 256 | GraphLog: a Visual Formalism for Real Life Recursion | 1990 | PODS | 0.00030259041 |
| 363 | A Graphical Query Language Supporting Recursion | 1987 | SIGMOD | 0.00025715157 |
| 810 | Query Containment for Conjunctive Queries With Regular Expressions | 1998 | PODS | 0.00016428374 |
| 6,534 | Automatic Rule Refinement for Information Extraction | 2010 | VLDB | 5.0244622e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,198 | Algebras for Querying Text Regions (Extended Abstract) | 1995 | PODS | 5.6346171e-05 |
| 10,900 | Generalized Core Spanner Inexpressibility via Ehrenfeucht-Fraïssé Games for FC | 2024 | PODS | 4.1945683e-05 |
| 6,534 | Automatic Rule Refinement for Information Extraction | 2010 | VLDB | 5.0244622e-05 |
| 6,958 | Computational Aspects of Resilient Data Extraction from Semistructured Sources | 2000 | PODS | 4.8857878e-05 |
| 11,240 | Autonomously Computable Information Extraction | 2023 | VLDB | 4.1945683e-05 |
| 8,753 | Document Spanners — A Brief Overview of Concepts, Results, and Recent Developments | 2022 | PODS | 4.456315e-05 |
| 13,162 | SpannerLib: Embedding Declarative Information Extraction in an Imperative Workflow | 2024 | VLDB | - |
| 5,398 | Cleaning Inconsistencies in Information Extraction via Prioritized Repairs | 2014 | PODS | 5.5295577e-05 |
| 2,929 | Complexity Bounds for Relational Algebra over Document Spanners | 2019 | PODS | 7.8800307e-05 |
| 1,938 | Split-Correctness in Information Extraction | 2019 | PODS | 0.00010028895 |