Database Paper Browser

Back to papers

GIO: Generating Efficient Matrix and Frame Readers for Custom Data Formats by Example

Summary: GIO learns mapping from raw text to matrix/frame layouts from a sample and auto-generates a fast multi-threaded reader. Supports CSV/LibSVM/MatrixMarket and nested formats; mappings and readers editable; competitive with hand-written parsers. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
6623
Venue
SIGMOD
Year
2023
Pagerank
4.3462787e-05
Overall Rank
9,379 | 34.76%
DOI
10.1145/3589265

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 1 of 1 citing papers.

Rank Citing Paper Year Venue Pagerank
8,204 ELEET: Efficient Learned Query Execution over Text and Tables 2024 VLDB 4.5594273e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 45 of 45 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
66 Spark SQL: Relational Data Processing in Spark 2015 SIGMOD 0.00061639801
168 MAD Skills: New Analysis Practices for Big Data 2009 VLDB 0.00038946305
173 Schema Mapping as Query Discovery 2000 VLDB 0.00038627829
303 Generic Schema Matching with Cupid 2001 VLDB 0.00028301477
382 COMA - A system for flexible combination of schema matching approaches 2002 VLDB 0.00024823252
476 Impala: A Modern, Open-Source SQL Engine for Hadoop 2015 CIDR 0.00022226941
483 Clio Grows Up: From Research Prototype to Industrial Tool 2005 SIGMOD 0.00022125107
621 Schema Mappings, Data Exchange, and Metadata Management 2005 PODS 0.00019005115
968 Schema and Ontology Matching with COMA++ 2005 SIGMOD 0.0001495703
1,065 Data-Driven Understanding and Refinement of Schema Mappings 2001 SIGMOD 0.00014338146
1,087 HOT: A Height Optimized Trie Index for Main-Memory Database Systems 2018 SIGMOD 0.00014162909
1,265 Jaql: A Scripting Language for Large Scale Semistructured Data Analysis 2011 VLDB 0.00012947629
1,277 The Data Civilizer System 2017 CIDR 0.00012879695
1,343 NoDB: Efficient Query Execution on Raw Data Files 2012 SIGMOD 0.00012482538
1,377 Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics 2021 CIDR 0.00012296941
1,527 Generic Schema Matching, Ten Years Later 2011 VLDB 0.00011499442
1,596 Clio: A Semi-Automatic Tool For Schema Mapping 2001 SIGMOD 0.00011214591
2,078 Sample-Driven Schema Mapping 2012 SIGMOD 9.599707e-05
2,122 SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle 2020 CIDR 9.4989076e-05
2,124 Characterizing Schema Mappings via Data Examples 2010 PODS 9.4912951e-05
2,367 Here are my Data Files. Here are my Queries. Where are my Results? 2011 CIDR 8.9511058e-05
2,700 Filter Before You Parse: Faster Analytics on Raw Data with Sparser 2018 VLDB 8.2728509e-05
2,757 Parallel Data Analysis Directly on Scientific File Formats 2014 SIGMOD 8.1679384e-05
2,819 Mison: A Fast JSON Parser for Data Analytics 2017 VLDB 8.0651326e-05
2,888 Sato: Contextual Semantic Type Detection in Tables 2020 VLDB 7.9594996e-05
2,973 Parallel In-Situ Data Processing with Speculative Loading 2014 SIGMOD 7.7902322e-05
3,437 Speculative Distributed CSV Data Parsing for Big Data Analytics 2019 SIGMOD 7.0942161e-05
3,467 Data Profiling – A Tutorial 2017 SIGMOD 7.069081e-05
3,548 Adaptive Query Processing on RAW Data 2014 VLDB 6.9859242e-05
3,866 Designing and Refining Schema Mappings via Data Examples 2011 SIGMOD 6.6837e-05
3,918 On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML 2018 VLDB 6.6315176e-05
3,940 NoDB in Action: Adaptive Query Processing on Raw Data 2012 VLDB 6.6153423e-05
4,326 Fast Queries Over Heterogeneous Data Through Engine Customization 2016 VLDB 6.288323e-05
4,704 JSON Tiles: Fast Analytics on Semi-Structured Data 2021 SIGMOD 5.9853687e-05
5,091 EIRENE: Interactive Design and Refinement of Schema Mappings via Data Examples 2011 VLDB 5.7032726e-05
5,242 Towards Benchmarking Feature Type Inference for AutoML Platforms 2021 SIGMOD 5.6074743e-05
5,301 ReCache: Reactive Caching for Fast Analytics over Heterogeneous Data 2018 VLDB 5.5790928e-05
5,595 Schemas and Types for JSON Data: from Theory to Practice 2019 SIGMOD 5.4191724e-05
6,648 Grizzly: Efficient Stream Processing Through Adaptive Query Compilation 2020 SIGMOD 4.9771723e-05
7,360 ParPaRaw: Massively Parallel Parsing of Delimiter-Separated Raw Data 2020 VLDB 4.7525925e-05
7,463 Automated Migration of Hierarchical Data to Relational Tables using Programming-by-Example 2018 VLDB 4.7232241e-05
7,704 ExDRa: Exploratory Data Science on Federated Raw Data 2021 SIGMOD 4.6733838e-05
7,830 Scalable Structural Index Construction for JSON Analytics 2021 VLDB 4.6388763e-05
8,271 Rumble: Data Independence for Large Messy Data Sets 2021 VLDB 4.5453618e-05
9,939 Witness Generation for JSON Schema 2022 VLDB 4.2462227e-05
Previous Page 1 / 1 Next

Semantically Similar Papers