A framework for annotating CSV-like data
Summary: Framework to annotate CSV-like data and noisy variants with metadata via an extended regex selection language and annotation rules. The approach yields an output-sensitive, input-linear evaluator; real-data experiments confirm practical efficiency. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
Incoming Citations (Sorted by Pagerank)
Showing 2 of 2 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3,963 | Pytheas: Pattern-based Table Discovery in CSV Files | 2020 | VLDB | 6.5840643e-05 |
| 7,807 | Pollock: A Data Loading Benchmark | 2023 | VLDB | 4.6457732e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 3 of 3 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 287 | Declarative Information Extraction Using Datalog with Embedded Extraction Predicates | 2007 | VLDB | 0.00028971272 |
| 561 | An Annotation Management System for Relational Databases | 2004 | VLDB | 0.00020115419 |
| 2,114 | Rondo: A Programming Platform for Generic Model Management | 2003 | SIGMOD | 9.5268855e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,398 | Cleaning Inconsistencies in Information Extraction via Prioritized Repairs | 2014 | PODS | 5.5295577e-05 |
| 11,874 | Graph-based Exploration of Non-graph Datasets | 2016 | VLDB | 4.1945683e-05 |
| 3,963 | Pytheas: Pattern-based Table Discovery in CSV Files | 2020 | VLDB | 6.5840643e-05 |
| 2,517 | Annotating Columns with Pre-trained Language Models | 2022 | SIGMOD | 8.6092139e-05 |
| 9,379 | GIO: Generating Efficient Matrix and Frame Readers for Custom Data Formats by Example | 2023 | SIGMOD | 4.3462787e-05 |
| 4,092 | Structured Annotations of Web Queries | 2010 | SIGMOD | 6.4561959e-05 |
| 11,240 | Autonomously Computable Information Extraction | 2023 | VLDB | 4.1945683e-05 |
| 7,807 | Pollock: A Data Loading Benchmark | 2023 | VLDB | 4.6457732e-05 |
| 8,007 | A Grammar-based Entity Representation Framework for Data Cleaning | 2009 | SIGMOD | 4.6068018e-05 |
| 3,437 | Speculative Distributed CSV Data Parsing for Big Data Analytics | 2019 | SIGMOD | 7.0942161e-05 |