Split-Correctness in Information Extraction
Summary: Formalizes “split-correctness” for document spanners to detect when extractors can be evaluated independently on segments (sentences, k‑grams, requests), enabling parallel and incremental processing. Provides complexity results for regular spanners and variants with black‑box split constraints. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Johannes Doleschal
- 2. Benny Kimelfeld
- 3. Wim Martens
- 4. Yoav Nahshon
- 5. Frank Neven
Incoming Citations (Sorted by Pagerank)
Showing 5 of 5 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3,563 | Spanner Evaluation over SLP-Compressed Documents | 2021 | PODS | 6.9690833e-05 |
| 6,820 | Conjunctive Regular Path Queries with String Variables | 2020 | PODS | 4.9157306e-05 |
| 8,753 | Document Spanners — A Brief Overview of Concepts, Results, and Recent Developments | 2022 | PODS | 4.456315e-05 |
| 9,157 | REmatch: a novel regex engine for finding all matches | 2023 | VLDB | 4.3849295e-05 |
| 11,240 | Autonomously Computable Information Extraction | 2023 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 5 of 5 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 287 | Declarative Information Extraction Using Datalog with Embedded Extraction Predicates | 2007 | VLDB | 0.00028971272 |
| 667 | Incremental Knowledge Base Construction Using DeepDive | 2015 | VLDB | 0.00018440557 |
| 3,622 | Distributed XML Design | 2009 | PODS | 6.9066476e-05 |
| 8,215 | Parallel-Correctness and Transferability for Conjunctive Queries | 2015 | PODS | 4.5577562e-05 |
| 12,216 | Schema Design for XML Repositories: Complexity and Tractability | 2010 | PODS | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 6,919 | Efficient Indexing and Querying over Syntactically Annotated Trees | 2012 | VLDB | 4.8925595e-05 |
| 7,280 | I4E: Interactive Investigation of Iterative Information Extraction | 2010 | SIGMOD | 4.778826e-05 |
| 5,379 | Scalable Ad-hoc Entity Extraction from Text Collections | 2008 | VLDB | 5.5405989e-05 |
| 6,534 | Automatic Rule Refinement for Information Extraction | 2010 | VLDB | 5.0244622e-05 |
| 2,005 | Record-Boundary Discovery in Web Documents | 1999 | SIGMOD | 9.8112591e-05 |
| 5,398 | Cleaning Inconsistencies in Information Extraction via Prioritized Repairs | 2014 | PODS | 5.5295577e-05 |
| 6,490 | Spanners: A Formal Framework for Information Extraction | 2013 | PODS | 5.0431719e-05 |
| 6,958 | Computational Aspects of Resilient Data Extraction from Semistructured Sources | 2000 | PODS | 4.8857878e-05 |
| 8,148 | When Speed Has a Price: Fast Information Extraction Using Approximate Algorithms | 2013 | VLDB | 4.5754467e-05 |
| 11,240 | Autonomously Computable Information Extraction | 2023 | VLDB | 4.1945683e-05 |