Back to papers
Chorus: Foundation Models for Unified Data Discovery and Exploration
Summary: Apply foundation models to unify data discovery/exploration, outperforming task-specific models and often human experts on table-class, column-type, and join-column tasks. Evaluate cross-model generalizability and nondeterminism, arguing for a foundation-model unification of data-management tasks.
(summarized by gpt-5-mini on Feb 09 2026)
- Paper ID
- 13442
- Venue
- VLDB
- Year
- 2024
- Pagerank
- 7.7092391e-05
- Overall Rank
- 3,015 | 79.03%
- DOI
-
10.14778/3659437.3659461
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 19 of 19 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 1,963 |
DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing |
2025 |
VLDB |
9.929429e-05 |
| 2,587 |
Table-GPT: Table Fine-tuned GPT for Diverse Table Tasks |
2024 |
SIGMOD |
8.4924618e-05 |
| 3,876 |
The Design of an LLM-powered Unstructured Analytics System |
2025 |
CIDR |
6.6741456e-05 |
| 5,099 |
ArcheType: A Novel Framework for Open-Source Column Type Annotation using Large Language Models |
2024 |
VLDB |
5.6997784e-05 |
| 5,509 |
Can Large Language Models Predict Data Correlations from Column Names? |
2023 |
VLDB |
5.4703368e-05 |
| 5,840 |
Logical and Physical Optimizations for SQL Query Execution over Large Language Models |
2025 |
SIGMOD |
5.3042561e-05 |
| 5,928 |
SchemaPile: A Large Collection of Relational Database Schemas |
2024 |
SIGMOD |
5.2685946e-05 |
| 6,092 |
Observatory: Characterizing Embeddings of Relational Tables |
2024 |
VLDB |
5.2138566e-05 |
| 7,026 |
Mind the Data Gap: Bridging LLMs to Enterprise Data Integration |
2025 |
CIDR |
4.8570811e-05 |
| 7,048 |
Magneto: Combining Small and Large Language Models for Schema Matching |
2025 |
VLDB |
4.8520651e-05 |
| 8,736 |
Unveiling Challenges for LLMs in Enterprise Data Engineering |
2026 |
VLDB |
4.456315e-05 |
| 9,515 |
Automating the Enterprise with Foundation Models |
2024 |
VLDB |
4.3335877e-05 |
| 10,142 |
AutoDDG: Automated Dataset Description Generation using Large Language Models |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,465 |
A Cost-Effective LLM-based Approach to Identify Wildlife Trafficking in Online Marketplaces |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,503 |
Self-Enhancing Video Data Management System for Compositional Events with Large Language Models |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,595 |
Optimized Batch Prompting for Cost-effective LLMs |
2025 |
VLDB |
4.1945683e-05 |
| 10,610 |
Weak-to-Strong Prompts with Lightweight-to-Powerful LLMs for High-Accuracy, Low-Cost, and Explainable Data Transformation |
2025 |
VLDB |
4.1945683e-05 |
| 10,753 |
Cents: A Flexible and Cost-Effective Framework for LLM-Based Table Understanding |
2025 |
VLDB |
4.1945683e-05 |
| 10,860 |
Exploring Exploratory Querying |
2025 |
VLDB |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 18 of 18 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 31 |
Provenance Semirings |
2007 |
PODS |
0.0007857786 |
| 107 |
WebTables: Exploring the Power of Tables on the Web |
2008 |
VLDB |
0.00048377684 |
| 475 |
Mining Database Structure; Or, How to Build a Data Quality Browser |
2002 |
SIGMOD |
0.00022303253 |
| 513 |
TURL: Table Understanding through Representation Learning |
2021 |
VLDB |
0.00021288342 |
| 517 |
Can Foundation Models Wrangle Your Data? |
2023 |
VLDB |
0.00021169035 |
| 939 |
Data Lake Management: Challenges and Opportunities |
2019 |
VLDB |
0.00015187344 |
| 1,116 |
Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes |
2024 |
VLDB |
0.00013890154 |
| 1,187 |
JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes |
2019 |
SIGMOD |
0.00013443639 |
| 1,751 |
Auctus: A Dataset Search Engine for Data Discovery and Augmentation |
2021 |
VLDB |
0.00010683295 |
| 2,517 |
Annotating Columns with Pre-trained Language Models |
2022 |
SIGMOD |
8.6092139e-05 |
| 2,888 |
Sato: Contextual Semantic Type Detection in Tables |
2020 |
VLDB |
7.9594996e-05 |
| 3,000 |
SANTOS: Relationship-based Semantic Table Union Search |
2023 |
SIGMOD |
7.7462128e-05 |
| 3,252 |
Auto-Suggest: Learning-to-Recommend Data Preparation Steps Using Data Science Notebooks |
2020 |
SIGMOD |
7.3178277e-05 |
| 3,335 |
DeepJoin: Joinable Table Discovery with Pre-trained Language Models |
2023 |
VLDB |
7.2065006e-05 |
| 3,520 |
GitTables: A Large-Scale Corpus of Relational Tables |
2023 |
SIGMOD |
7.0131061e-05 |
| 4,106 |
Extracting Databases from Dark Data with DeepDive |
2016 |
SIGMOD |
6.4456184e-05 |
| 5,486 |
Fast Foreign-Key Detection in Microsoft SQL Server PowerPivot for Excel |
2014 |
VLDB |
5.4811603e-05 |
| 6,890 |
Towards NLP-Enhanced Data Profiling Tools |
2022 |
CIDR |
4.8928923e-05 |
Semantically Similar Papers