Back to papers
Auctus: A Dataset Search Engine for Data Discovery and Augmentation
Summary: Auctus, a dataset search engine for discovering and augmenting structured data across Web tables, portals, and enterprises. Unique architecture supports rich dataset queries, exploration workflows, and case studies showing data augmentation to improve ML models and analytics.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 12473
- Venue
- VLDB
- Year
- 2021
- Pagerank
- 0.00010683295
- Overall Rank
- 1,751 | 87.83%
- DOI
-
10.14778/3476311.3476346
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 23 of 23 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 2,836 |
Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning |
2023 |
VLDB |
8.0443826e-05 |
| 3,015 |
Chorus: Foundation Models for Unified Data Discovery and Exploration |
2024 |
VLDB |
7.7092391e-05 |
| 3,942 |
Ember: No-Code Context Enrichment via Similarity-Based Keyless Joins |
2022 |
VLDB |
6.6114622e-05 |
| 4,967 |
Leva: Boosting Machine Learning Performance with Relational Embedding Data Augmentation |
2022 |
SIGMOD |
5.7956612e-05 |
| 5,024 |
Towards Distribution-aware Query Answering in Data Markets |
2022 |
VLDB |
5.7535043e-05 |
| 5,963 |
Automatic Data Acquisition for Deep Learning |
2021 |
VLDB |
5.2526794e-05 |
| 5,976 |
Responsible Data Integration: Next-generation Challenges |
2022 |
SIGMOD |
5.245976e-05 |
| 6,077 |
The Fast and the Private: Task-based Dataset Search |
2024 |
CIDR |
5.2229324e-05 |
| 6,217 |
Pneuma: Leveraging LLMs for Tabular Data Representation and Retrieval in an End-to-End System |
2025 |
SIGMOD |
5.1534752e-05 |
| 6,449 |
Causal Data Integration |
2023 |
VLDB |
5.0587746e-05 |
| 7,491 |
Saibot: A Differentially Private Data Search Platform |
2023 |
VLDB |
4.7180617e-05 |
| 7,582 |
LakeCompass: An End-to-End System for Data Maintenance, Search and Analysis in Data Lakes |
2024 |
VLDB |
4.7046388e-05 |
| 7,868 |
Solo: Data Discovery Using Natural Language Questions Via A Self-Supervised Approach |
2023 |
SIGMOD |
4.6319504e-05 |
| 8,618 |
Nexus: Correlation Discovery over Collections of Spatio-Temporal Tabular Data |
2024 |
SIGMOD |
4.4838259e-05 |
| 9,928 |
Fainder: A Fast and Accurate Index for Distribution-Aware Dataset Search |
2024 |
VLDB |
4.2511622e-05 |
| 10,142 |
AutoDDG: Automated Dataset Description Generation using Large Language Models |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,197 |
Qualitative Join Discovery in Data Lakes using Examples |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,329 |
Revisiting Task-Oriented Dataset Search in the Era of Large Language Models: Challenges, Benchmark, and Solution |
2026 |
VLDB |
4.1945683e-05 |
| 10,341 |
A Theoretical Framework for Distribution-Aware Dataset Search |
2025 |
PODS |
4.1945683e-05 |
| 10,439 |
Finding What You’re Looking For: A Distribution-Aware Dataset Search Engine in Action |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,645 |
OpenForge: Probabilistic Metadata Integration |
2025 |
VLDB |
4.1945683e-05 |
| 10,685 |
LakeVisage: Towards Scalable, Flexible and Interactive Visualization Recommendation for Data Discovery over Data Lakes |
2025 |
VLDB |
4.1945683e-05 |
| 11,379 |
Fast Dataset Search with Earth Mover’s Distance |
2022 |
VLDB |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 6 of 6 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 495 |
Milvus: A Purpose-Built Vector Data Management System |
2021 |
SIGMOD |
0.00021767688 |
| 10,438 |
Doctopus: A System for Budget-aware Structural Data Extraction from Unstructured Documents |
2025 |
SIGMOD |
4.1945683e-05 |
| 9,152 |
Doctopus: Budget-aware Structural Table Extraction from Unstructured Documents |
2025 |
VLDB |
4.3849295e-05 |
| 2,730 |
Open Data Integration |
2018 |
VLDB |
8.2126735e-05 |
| 6,792 |
Automatically Incorporating New Sources in Keyword Search-Based Data Integration |
2010 |
SIGMOD |
4.9249098e-05 |
| 8,696 |
Effective Entity Augmentation By Querying External Data Sources |
2023 |
VLDB |
4.4660032e-05 |
| 5,529 |
Data-Driven Domain Discovery for Structured Datasets |
2020 |
VLDB |
5.4566641e-05 |
| 10,142 |
AutoDDG: Automated Dataset Description Generation using Large Language Models |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,439 |
Finding What You’re Looking For: A Distribution-Aware Dataset Search Engine in Action |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,329 |
Revisiting Task-Oriented Dataset Search in the Era of Large Language Models: Challenges, Benchmark, and Solution |
2026 |
VLDB |
4.1945683e-05 |