Extracting Schema from Semistructured Data
Summary: Models semistructured data as labeled directed graphs; types them via greatest fixpoint semantics of monadic Datalog. Approximate typing algorithm; optimal typing NP-hard; clustering-based heuristics yield near-optimal results; preliminary experiments. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
Incoming Citations (Sorted by Pagerank)
Showing 11 of 11 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 66 | Spark SQL: Relational Data Processing in Spark | 2015 | SIGMOD | 0.00061639801 |
| 207 | Storing Semistructured Data with STORED | 1999 | SIGMOD | 0.00034611968 |
| 882 | DTD Inference for Views of XML Data | 2000 | PODS | 0.00015657456 |
| 992 | XTRACT: A System for Extracting Document Type Descriptors from XML Documents | 2000 | SIGMOD | 0.00014799689 |
| 2,864 | Inferring XML Schema Definitions from XML Data | 2007 | VLDB | 7.9863574e-05 |
| 3,138 | Inference of Concise DTDs from XML Data | 2006 | VLDB | 7.4876241e-05 |
| 3,349 | Schema Management for Document Stores | 2015 | VLDB | 7.1903648e-05 |
| 3,681 | Queries with Incomplete Answers over Semistructured Data | 1999 | PODS | 6.8492288e-05 |
| 7,571 | Reducing Ambiguity in Json Schema Discovery | 2021 | SIGMOD | 4.7075853e-05 |
| 8,632 | Measuring the Structural Similarity of Semistructured Documents Using Entropy | 2007 | VLDB | 4.4803734e-05 |
| 12,663 | Querying Websites Using Compact Skeletons | 2001 | PODS | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 4 of 4 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 61 | DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases | 1997 | VLDB | 0.00064329285 |
| 114 | A Query Language and Optimization Techniques for Unstructured Data | 1996 | SIGMOD | 0.00046339735 |
| 1,669 | Query Decomposition and View Maintenance for Query Languages for Unstructured Data | 1996 | VLDB | 0.00010955767 |
| 3,956 | An Object Data Model with Roles | 1993 | VLDB | 6.5908944e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3,257 | Path Constraints on Semistructured and Structured Data | 1998 | PODS | 7.3151681e-05 |
| 3,866 | Designing and Refining Schema Mappings via Data Examples | 2011 | SIGMOD | 6.6837e-05 |
| 2,907 | Convergence of Datalog over (Pre-) Semirings | 2022 | PODS | 7.933806e-05 |
| 9,676 | Schema-Based Query Optimisation for Graph Databases | 2025 | SIGMOD | 4.3047774e-05 |
| 8,971 | A Principled Approach to Bridging the Gap between Graph Data and their Schemas | 2014 | VLDB | 4.4187977e-05 |
| 12,382 | Type Inference and Type Checking for Queries on Execution Traces | 2008 | VLDB | 4.1945683e-05 |
| 5,992 | Evaluating Datalog over Semirings: A Grounding-based Approach | 2024 | PODS | 5.2415551e-05 |
| 1,897 | Type Inference for Queries on Semistructured Data (Extended Abstract) | 1999 | PODS | 0.00010178006 |
| 3,681 | Queries with Incomplete Answers over Semistructured Data | 1999 | PODS | 6.8492288e-05 |
| 1,314 | Semistructured Data | 1997 | PODS | 0.0001263326 |