Simplifying Access to Large-scale Structured Datasets by Meta-Profiling with Scalable Training Set Enrichment
Summary: Large-scale structured datasets with millions of tables and diverse schemas hinder topic-centric discovery and querying. A deep-learning driven, unsupervised training-set enrichment yields Meta-profile, a standardized topic interface enabling access to all relevant topical tables across ultra-large corpora. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Authors
- 1. Sophie Pavia
- 2. Rituparna Khan
- 3. Anna Pyayt
- 4. Michael Gubanov
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 4 of 4 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 107 | WebTables: Exploring the Power of Tables on the Web | 2008 | VLDB | 0.00048377684 |
| 188 | Applying Model Management to Classical Meta Data Problems | 2003 | CIDR | 0.00035968389 |
| 1,078 | Model Management 2.0: Manipulating Richer Mappings | 2007 | SIGMOD | 0.00014245848 |
| 12,324 | IBM UFO Repository: Object-Oriented Data Integration | 2009 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 7,643 | Cross Modal Data Discovery over Structured and Unstructured Data Lakes | 2023 | VLDB | 4.6901105e-05 |
| 8,913 | Making Table Understanding Work in Practice | 2022 | CIDR | 4.427232e-05 |
| 3,335 | DeepJoin: Joinable Table Discovery with Pre-trained Language Models | 2023 | VLDB | 7.2065006e-05 |
| 6,894 | TableDC: Deep Clustering for Tabular Data | 2025 | SIGMOD | 4.8925595e-05 |
| 6,368 | Pre-training Summarization Models of Structured Datasets for Cardinality Estimation | 2022 | VLDB | 5.0937722e-05 |
| 10,142 | AutoDDG: Automated Dataset Description Generation using Large Language Models | 2026 | SIGMOD | 4.1945683e-05 |
| 8,996 | MetaInsight: Automatic Discovery of Structured Knowledge for Exploratory Data Analysis | 2021 | SIGMOD | 4.4124959e-05 |
| 5,529 | Data-Driven Domain Discovery for Structured Datasets | 2020 | VLDB | 5.4566641e-05 |
| 6,890 | Towards NLP-Enhanced Data Profiling Tools | 2022 | CIDR | 4.8928923e-05 |
| 1,625 | Data Profiling with Metanome | 2015 | VLDB | 0.00011094926 |