Database Paper Browser

Back to papers

Pytheas: Pattern-based Table Discovery in CSV Files

Summary: Pytheas uses pattern-based line classification and column-value coherency to discover tables in loosely structured CSVs. It achieves precision/recall above 95% (vs ~89/81), generalizes across countries, and provides a confidence measure for potential errors. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
12100
Venue
VLDB
Year
2020
Pagerank
6.5840643e-05
Overall Rank
3,963 | 72.44%
DOI
10.14778/3407790.3407810

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 4 of 4 citing papers.

Rank Citing Paper Year Venue Pagerank
7,102 Mondrian: Spreadsheet Layout Detection 2022 SIGMOD 4.8307982e-05
7,807 Pollock: A Data Loading Benchmark 2023 VLDB 4.6457732e-05
8,503 A Demonstration of KGLac: A Data Discovery and Enrichment Platform for Data Science 2021 VLDB 4.496339e-05
11,420 Detecting Layout Templates in Complex Multiregion Files 2022 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 13 of 13 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers