Inference of Concise DTDs from XML Data
Summary: Infers concise XML DTDs by learning SOREs and CHAREs, capturing the majority of practical DTDs. iDTD uses an automata-to-RE rewrite to derive SOREs; crx directly learns CHAREs for small data, enabling fast, accurate schema inference with noise handling. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Geert Jan Bex
- 2. Frank Neven
- 3. Thomas Schwentick
- 4. Karl Tuyls
Incoming Citations (Sorted by Pagerank)
Showing 11 of 11 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 809 | Curated Databases | 2008 | PODS | 0.00016430384 |
| 2,506 | Auto-Detect: Data-Driven Error Detection in Tables | 2018 | SIGMOD | 8.6335464e-05 |
| 2,864 | Inferring XML Schema Definitions from XML Data | 2007 | VLDB | 7.9863574e-05 |
| 3,845 | On Repairing Structural Problems In Semi-structured Data | 2013 | VLDB | 6.7073366e-05 |
| 5,948 | Minimization of Tree Pattern Queries with Constraints | 2008 | SIGMOD | 5.2602218e-05 |
| 7,571 | Reducing Ambiguity in Json Schema Discovery | 2021 | SIGMOD | 4.7075853e-05 |
| 8,943 | Towards Theory for Real-World Data | 2022 | PODS | 4.4258797e-05 |
| 11,150 | Zed: Leveraging Data Types to Process Eclectic Data | 2023 | CIDR | 4.1945683e-05 |
| 11,936 | TreeScope: Finding Structural Anomalies In Semi-Structured Data | 2015 | VLDB | 4.1945683e-05 |
| 12,306 | Simplifying XML Schema: Effortless Handling of Nondeterministic Regular Expressions | 2009 | SIGMOD | 4.1945683e-05 |
| 12,372 | SchemaScope: a System for Inferring and Cleaning XML Schemas | 2008 | SIGMOD | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 9 of 9 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 61 | DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases | 1997 | VLDB | 0.00064329285 |
| 188 | Applying Model Management to Classical Meta Data Problems | 2003 | CIDR | 0.00035968389 |
| 207 | Storing Semistructured Data with STORED | 1999 | SIGMOD | 0.00034611968 |
| 1,011 | ToXgene: A template-based data generator for XML | 2002 | SIGMOD | 0.00014652718 |
| 1,163 | Extracting Schema from Semistructured Data | 1998 | SIGMOD | 0.00013577466 |
| 1,245 | Answering XML Queries over Heterogeneous Data Sources | 2001 | VLDB | 0.00013080995 |
| 1,929 | XPath Satisfiability in the Presence of DTDs | 2005 | PODS | 0.00010058897 |
| 2,676 | LORE: A Lightweight Object REpository for Semistructured Data | 1996 | SIGMOD | 8.3274001e-05 |
| 3,925 | Schema-based Scheduling of Event Processors and Buffer Minimization for Queries on Structured Data Streams | 2004 | VLDB | 6.6260709e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 4,589 | Scalable Regular Expression Matching on Data Streams | 2008 | SIGMOD | 6.06476e-05 |
| 1,356 | Validating Streaming XML Documents | 2002 | PODS | 0.0001239231 |
| 12,418 | XML-Document-Filtering Automaton | 2008 | VLDB | 4.1945683e-05 |
| 13,723 | TREX: DTD-Conforming XML to XML Transformations | 2003 | SIGMOD | - |
| 2,211 | XML Data Exchange: Consistency and Query Answering | 2005 | PODS | 9.2771941e-05 |
| 12,218 | A Learning Algorithm for Top-Down XML Transformations | 2010 | PODS | 4.1945683e-05 |
| 12,102 | Deterministic Regular Expressions in Linear Time | 2012 | PODS | 4.1945683e-05 |
| 882 | DTD Inference for Views of XML Data | 2000 | PODS | 0.00015657456 |
| 2,864 | Inferring XML Schema Definitions from XML Data | 2007 | VLDB | 7.9863574e-05 |
| 992 | XTRACT: A System for Extracting Document Type Descriptors from XML Documents | 2000 | SIGMOD | 0.00014799689 |