Database Paper Browser

Back to papers

Chorus: Foundation Models for Unified Data Discovery and Exploration

Summary: Apply foundation models to unify data discovery/exploration, outperforming task-specific models and often human experts on table-class, column-type, and join-column tasks. Evaluate cross-model generalizability and nondeterminism, arguing for a foundation-model unification of data-management tasks. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID
13442
Venue
VLDB
Year
2024
Pagerank
7.7092391e-05
Overall Rank
3,015 | 79.03%
DOI
10.14778/3659437.3659461

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 19 of 19 citing papers.

Rank Citing Paper Year Venue Pagerank
1,963 DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing 2025 VLDB 9.929429e-05
2,587 Table-GPT: Table Fine-tuned GPT for Diverse Table Tasks 2024 SIGMOD 8.4924618e-05
3,876 The Design of an LLM-powered Unstructured Analytics System 2025 CIDR 6.6741456e-05
5,099 ArcheType: A Novel Framework for Open-Source Column Type Annotation using Large Language Models 2024 VLDB 5.6997784e-05
5,509 Can Large Language Models Predict Data Correlations from Column Names? 2023 VLDB 5.4703368e-05
5,840 Logical and Physical Optimizations for SQL Query Execution over Large Language Models 2025 SIGMOD 5.3042561e-05
5,928 SchemaPile: A Large Collection of Relational Database Schemas 2024 SIGMOD 5.2685946e-05
6,092 Observatory: Characterizing Embeddings of Relational Tables 2024 VLDB 5.2138566e-05
7,026 Mind the Data Gap: Bridging LLMs to Enterprise Data Integration 2025 CIDR 4.8570811e-05
7,048 Magneto: Combining Small and Large Language Models for Schema Matching 2025 VLDB 4.8520651e-05
8,736 Unveiling Challenges for LLMs in Enterprise Data Engineering 2026 VLDB 4.456315e-05
9,515 Automating the Enterprise with Foundation Models 2024 VLDB 4.3335877e-05
10,142 AutoDDG: Automated Dataset Description Generation using Large Language Models 2026 SIGMOD 4.1945683e-05
10,465 A Cost-Effective LLM-based Approach to Identify Wildlife Trafficking in Online Marketplaces 2025 SIGMOD 4.1945683e-05
10,503 Self-Enhancing Video Data Management System for Compositional Events with Large Language Models 2025 SIGMOD 4.1945683e-05
10,595 Optimized Batch Prompting for Cost-effective LLMs 2025 VLDB 4.1945683e-05
10,610 Weak-to-Strong Prompts with Lightweight-to-Powerful LLMs for High-Accuracy, Low-Cost, and Explainable Data Transformation 2025 VLDB 4.1945683e-05
10,753 Cents: A Flexible and Cost-Effective Framework for LLM-Based Table Understanding 2025 VLDB 4.1945683e-05
10,860 Exploring Exploratory Querying 2025 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 18 of 18 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
31 Provenance Semirings 2007 PODS 0.0007857786
107 WebTables: Exploring the Power of Tables on the Web 2008 VLDB 0.00048377684
475 Mining Database Structure; Or, How to Build a Data Quality Browser 2002 SIGMOD 0.00022303253
513 TURL: Table Understanding through Representation Learning 2021 VLDB 0.00021288342
517 Can Foundation Models Wrangle Your Data? 2023 VLDB 0.00021169035
939 Data Lake Management: Challenges and Opportunities 2019 VLDB 0.00015187344
1,116 Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes 2024 VLDB 0.00013890154
1,187 JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes 2019 SIGMOD 0.00013443639
1,751 Auctus: A Dataset Search Engine for Data Discovery and Augmentation 2021 VLDB 0.00010683295
2,517 Annotating Columns with Pre-trained Language Models 2022 SIGMOD 8.6092139e-05
2,888 Sato: Contextual Semantic Type Detection in Tables 2020 VLDB 7.9594996e-05
3,000 SANTOS: Relationship-based Semantic Table Union Search 2023 SIGMOD 7.7462128e-05
3,252 Auto-Suggest: Learning-to-Recommend Data Preparation Steps Using Data Science Notebooks 2020 SIGMOD 7.3178277e-05
3,335 DeepJoin: Joinable Table Discovery with Pre-trained Language Models 2023 VLDB 7.2065006e-05
3,520 GitTables: A Large-Scale Corpus of Relational Tables 2023 SIGMOD 7.0131061e-05
4,106 Extracting Databases from Dark Data with DeepDive 2016 SIGMOD 6.4456184e-05
5,486 Fast Foreign-Key Detection in Microsoft SQL Server PowerPivot for Excel 2014 VLDB 5.4811603e-05
6,890 Towards NLP-Enhanced Data Profiling Tools 2022 CIDR 4.8928923e-05
Previous Page 1 / 1 Next

Semantically Similar Papers