ReCG: Bottom-Up JSON Schema Discovery Using a Repetitive Cluster-and-Generalize Framework
Summary: ReCG: a bottom-up JSON schema discovery that builds schemas from leaf nodes via repetitive cluster-and-generalize to avoid brittle top-down heuristics. Applies MDL to select concise, generalizable schemas; shows up to 47% precision/recall, 46% F1 and 2.11x speedup. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Joohyung Yun
- 2. Byungchul Tak
- 3. Wook-Shin Han
Incoming Citations (Sorted by Pagerank)
Showing 1 of 1 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 10,860 | Exploring Exploratory Querying | 2025 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 17 of 17 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 12,223 | Schema Clustering and Retrieval for Multi-domain Pay-As-You-Go Data Integration Systems | 2010 | SIGMOD | 4.1945683e-05 |
| 5,947 | Top-K Generation of Integrated Schemas Based on Directed and Weighted Correspondences | 2009 | SIGMOD | 5.2614521e-05 |
| 11,575 | JSON Schema Matching: Empirical Observations | 2020 | SIGMOD | 4.1945683e-05 |
| 2,781 | JSON: Data model, Query languages and Schema specification | 2017 | PODS | 8.1305074e-05 |
| 10,294 | Streaming Validation of JSON Documents Against Schemas | 2026 | VLDB | 4.1945683e-05 |
| 9,939 | Witness Generation for JSON Schema | 2022 | VLDB | 4.2462227e-05 |
| 3,349 | Schema Management for Document Stores | 2015 | VLDB | 7.1903648e-05 |
| 4,489 | Automatic Generation of Normalized Relational Schemas from Nested Key-Value Data | 2016 | SIGMOD | 6.1434237e-05 |
| 11,248 | Scalable Reasoning on Document Stores via Instance-Aware Query Rewriting | 2023 | VLDB | 4.1945683e-05 |
| 7,571 | Reducing Ambiguity in Json Schema Discovery | 2021 | SIGMOD | 4.7075853e-05 |