Extracting Databases from Dark Data with DeepDive
Summary: DeepDive turns dark data (text, tables, images) into relational databases. Combines large-scale probabilistic inference with a novel developer interaction cycle, achieving high precision/recall at modest cost. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Ce Zhang
- 2. Jaeho Shin
- 3. Christopher RĂ©
- 4. Michael Cafarella
- 5. Feng Niu
Incoming Citations (Sorted by Pagerank)
Showing 8 of 8 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 939 | Data Lake Management: Challenges and Opportunities | 2019 | VLDB | 0.00015187344 |
| 1,878 | Query-Driven On-The-Fly Knowledge Base Construction | 2018 | VLDB | 0.00010233436 |
| 3,015 | Chorus: Foundation Models for Unified Data Discovery and Exploration | 2024 | VLDB | 7.7092391e-05 |
| 3,155 | Ten Years of WebTables | 2018 | VLDB | 7.4672742e-05 |
| 5,251 | Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale | 2019 | SIGMOD | 5.6029615e-05 |
| 10,711 | Cracking Vector Search Indexes | 2025 | VLDB | 4.1945683e-05 |
| 10,976 | StarfishDB: a Query Execution Engine for Relational Probabilistic Programming | 2024 | SIGMOD | 4.1945683e-05 |
| 11,775 | Building Structured Databases of Factual Knowledge from Massive Text Corpora | 2017 | SIGMOD | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 12 of 12 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 4,181 | DivDB: A System for Diversifying Query Results | 2011 | VLDB | 6.3789851e-05 |
| 5,658 | Databases Unbound: Querying All of the World's Bytes with AI | 2024 | VLDB | 5.385675e-05 |
| 10,155 | DIVER: A Robust Text-to-SQL System with Dynamic Interactive Value Linking and Evidence Reasoning | 2026 | SIGMOD | 4.1945683e-05 |
| 3,335 | DeepJoin: Joinable Table Discovery with Pre-trained Language Models | 2023 | VLDB | 7.2065006e-05 |
| 6,133 | DIADEM: Thousands of Websites to a Single Database | 2014 | VLDB | 5.1954702e-05 |
| 6,722 | GeoDeepDive: Statistical Inference using Familiar Data-Processing Languages | 2013 | SIGMOD | 4.9491521e-05 |
| 608 | DeepDB: Learn from Data, not from Queries! | 2020 | VLDB | 0.00019235898 |
| 3,635 | A Deep Dive into Deep Learning Approaches for Text-to-SQL Systems | 2021 | SIGMOD | 6.8981006e-05 |
| 11,722 | Deeper: A Data Enrichment System Powered by Deep Web | 2018 | SIGMOD | 4.1945683e-05 |
| 667 | Incremental Knowledge Base Construction Using DeepDive | 2015 | VLDB | 0.00018440557 |