Collaborative Data Analytics with DataHub
Summary: DataHub is a unified platform for collaborative data analytics: load, store, query, and share datasets. Native versioning with concurrent updates; app ecosystem for diverse analytics; thrift-based serialization across 20+ languages via a shared notebook. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Anant Bhardwaj
- 2. Amol Deshpande
- 3. Aaron J. Elmore
- 4. David Karger
- 5. Sam Madden
- 6. Aditya Parameswaran
- 7. Harihar Subramanyam
- 8. Eugene Wu
- 9. Rebecca Zhang
Incoming Citations (Sorted by Pagerank)
Showing 8 of 8 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 610 | Goods: Organizing Google's Datasets | 2016 | SIGMOD | 0.00019232674 |
| 1,463 | ARDA: Automatic Relational Data Augmentation for Machine Learning | 2020 | VLDB | 0.00011869295 |
| 2,152 | MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis | 2018 | SIGMOD | 9.4239787e-05 |
| 2,359 | Data Market Platforms: Trading Data Assets to Solve Data Problems | 2020 | VLDB | 8.9607667e-05 |
| 6,295 | Your notebook is not crumby enough, REPLace it | 2020 | CIDR | 5.1249204e-05 |
| 7,745 | Crossing the finish line faster when paddling the Data Lake with KAYAK | 2017 | VLDB | 4.6618625e-05 |
| 7,994 | TardisDB: Extending SQL to Support Versioning | 2021 | SIGMOD | 4.61099e-05 |
| 9,515 | Automating the Enterprise with Foundation Models | 2024 | VLDB | 4.3335877e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 4 of 4 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 883 | Google Fusion Tables: Web-Centered Data Management and Collaboration | 2010 | SIGMOD | 0.00015656548 |
| 1,281 | DataHub: Collaborative Data Science & Dataset Version Management at Scale | 2015 | CIDR | 0.00012854744 |
| 1,565 | Principles of Dataset Versioning: Exploring the Recreation/Storage Tradeoff | 2015 | VLDB | 0.00011345567 |
| 5,107 | SeeDB: Automatically Generating Query Visualizations | 2014 | VLDB | 5.6925578e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 2,443 | Data Management for Data Science: Towards Embedded Analytics | 2020 | CIDR | 8.8078476e-05 |
| 6,508 | DataSpread: Unifying Databases and Spreadsheets | 2015 | VLDB | 5.0335028e-05 |
| 13,277 | The Challenge of Building Effective Data Lakes | 2020 | SIGMOD | - |
| 12,401 | Large-Scale Collaborative Analysis and Extraction of Web Data | 2008 | VLDB | 4.1945683e-05 |
| 9,197 | Demonstration of Collaborative and Interactive Workflow-Based Data Analytics in Texera | 2022 | VLDB | 4.3748331e-05 |
| 9,577 | CoClean: Collaborative Data Cleaning | 2020 | SIGMOD | 4.3248438e-05 |
| 11,919 | ShareInsights - An Unified Approach to Full-stack Data Processing | 2015 | SIGMOD | 4.1945683e-05 |
| 5,995 | DataChat: An Intuitive and Collaborative Data Analytics Platform | 2023 | SIGMOD | 5.2415551e-05 |
| 11,272 | Building a Collaborative Data Analytics System: Opportunities and Challenges | 2023 | VLDB | 4.1945683e-05 |
| 1,281 | DataHub: Collaborative Data Science & Dataset Version Management at Scale | 2015 | CIDR | 0.00012854744 |