Database Paper Browser

Back to papers

Auctus: A Dataset Search Engine for Data Discovery and Augmentation

Summary: Auctus, a dataset search engine for discovering and augmenting structured data across Web tables, portals, and enterprises. Unique architecture supports rich dataset queries, exploration workflows, and case studies showing data augmentation to improve ML models and analytics. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
12473
Venue
VLDB
Year
2021
Pagerank
0.00010683295
Overall Rank
1,751 | 87.83%
DOI
10.14778/3476311.3476346

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 23 of 23 citing papers.

Rank Citing Paper Year Venue Pagerank
2,836 Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning 2023 VLDB 8.0443826e-05
3,015 Chorus: Foundation Models for Unified Data Discovery and Exploration 2024 VLDB 7.7092391e-05
3,942 Ember: No-Code Context Enrichment via Similarity-Based Keyless Joins 2022 VLDB 6.6114622e-05
4,967 Leva: Boosting Machine Learning Performance with Relational Embedding Data Augmentation 2022 SIGMOD 5.7956612e-05
5,024 Towards Distribution-aware Query Answering in Data Markets 2022 VLDB 5.7535043e-05
5,963 Automatic Data Acquisition for Deep Learning 2021 VLDB 5.2526794e-05
5,976 Responsible Data Integration: Next-generation Challenges 2022 SIGMOD 5.245976e-05
6,077 The Fast and the Private: Task-based Dataset Search 2024 CIDR 5.2229324e-05
6,217 Pneuma: Leveraging LLMs for Tabular Data Representation and Retrieval in an End-to-End System 2025 SIGMOD 5.1534752e-05
6,449 Causal Data Integration 2023 VLDB 5.0587746e-05
7,491 Saibot: A Differentially Private Data Search Platform 2023 VLDB 4.7180617e-05
7,582 LakeCompass: An End-to-End System for Data Maintenance, Search and Analysis in Data Lakes 2024 VLDB 4.7046388e-05
7,868 Solo: Data Discovery Using Natural Language Questions Via A Self-Supervised Approach 2023 SIGMOD 4.6319504e-05
8,618 Nexus: Correlation Discovery over Collections of Spatio-Temporal Tabular Data 2024 SIGMOD 4.4838259e-05
9,928 Fainder: A Fast and Accurate Index for Distribution-Aware Dataset Search 2024 VLDB 4.2511622e-05
10,142 AutoDDG: Automated Dataset Description Generation using Large Language Models 2026 SIGMOD 4.1945683e-05
10,197 Qualitative Join Discovery in Data Lakes using Examples 2026 SIGMOD 4.1945683e-05
10,329 Revisiting Task-Oriented Dataset Search in the Era of Large Language Models: Challenges, Benchmark, and Solution 2026 VLDB 4.1945683e-05
10,341 A Theoretical Framework for Distribution-Aware Dataset Search 2025 PODS 4.1945683e-05
10,439 Finding What You’re Looking For: A Distribution-Aware Dataset Search Engine in Action 2025 SIGMOD 4.1945683e-05
10,645 OpenForge: Probabilistic Metadata Integration 2025 VLDB 4.1945683e-05
10,685 LakeVisage: Towards Scalable, Flexible and Interactive Visualization Recommendation for Data Discovery over Data Lakes 2025 VLDB 4.1945683e-05
11,379 Fast Dataset Search with Earth Mover’s Distance 2022 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 6 of 6 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers