Database Paper Browser

Back to papers

Incremental Knowledge Base Construction Using DeepDive

Summary: DeepDive fuses DB and ML ideas to accelerate knowledge-base construction from unstructured data. Proposes incremental inference via sampling and variational methods, plus a rule-based optimizer, achieving 100x speedups with negligible quality impact on 5 KBC workloads. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
11010
Venue
VLDB
Year
2015
Pagerank
0.00018440557
Overall Rank
667 | 95.37%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 42 of 42 citing papers.

Rank Citing Paper Year Venue Pagerank
192 HoloClean: Holistic Data Repairs with Probabilistic Inference 2017 VLDB 0.00035728858
834 Learning Linear Regression Models over Factorized Joins 2016 SIGMOD 0.00016135159
1,116 Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes 2024 VLDB 0.00013890154
1,277 The Data Civilizer System 2017 CIDR 0.00012879695
1,532 Data Management in Machine Learning: Challenges, Techniques, and Systems 2017 SIGMOD 0.00011472681
1,878 Query-Driven On-The-Fly Knowledge Base Construction 2018 VLDB 0.00010233436
1,938 Split-Correctness in Information Extraction 2019 PODS 0.00010028895
3,114 GPTuner: A Manual-Reading Database Tuning System via GPT-Guided Bayesian Optimization 2024 VLDB 7.5451724e-05
3,303 Fonduer: Knowledge Base Construction from Richly Formatted Data 2018 SIGMOD 7.2487486e-05
3,396 Automatic Data Repair: Are We Ready to Deploy? 2024 VLDB 7.1455126e-05
3,897 SLiMFast: Guaranteed Results for Data Fusion and Source Reliability 2017 SIGMOD 6.6554845e-05
3,995 How Large Language Models Will Disrupt Data Management 2023 VLDB 6.5513237e-05
4,164 SlimShot: In-Database Probabilistic Inference for Knowledge Bases 2016 VLDB 6.3923099e-05
4,196 Overton: A Data System for Monitoring and Improving Machine-Learned Products 2020 CIDR 6.3686231e-05
4,630 Knowledge Graphs 2021: A Data Odyssey 2021 VLDB 6.0348379e-05
4,774 LIMA: Fine-grained Lineage Tracing and Reuse in Machine Learning Systems 2021 SIGMOD 5.9316087e-05
5,041 KBPearl: A Knowledge Base Population System Supported by Joint Entity and Relation Linking 2020 VLDB 5.741618e-05
5,455 Natural Language Data Management and Interfaces: Recent Development and Open Challenges 2017 SIGMOD 5.4977219e-05
5,806 BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees 2019 SIGMOD 5.3200643e-05
6,347 A Relational Framework for Classifier Engineering 2017 PODS 5.1019568e-05
6,469 Materialization and Reuse Optimizations for Production Data Science Pipelines 2022 SIGMOD 5.0519488e-05
6,986 A Cost-based Optimizer for Gradient Descent Optimization 2017 SIGMOD 4.8727048e-05
7,066 On Multiple Semantics for Declarative Database Repairs 2020 SIGMOD 4.8445108e-05
7,704 ExDRa: Exploratory Data Science on Federated Raw Data 2021 SIGMOD 4.6733838e-05
7,833 Dependency-Driven Analytics: a Compass for Uncharted Data Oceans 2017 CIDR 4.6382648e-05
7,947 RuDiK: Rule Discovery in Knowledge Bases 2018 VLDB 4.613363e-05
8,204 ELEET: Efficient Learned Query Execution over Text and Tables 2024 VLDB 4.5594273e-05
8,514 UPLIFT: Parallelization Strategies for Feature Transformations in Machine Learning Workloads 2022 VLDB 4.4944285e-05
8,581 Anytime Approximation in Probabilistic Databases via Scaled Dissociations 2019 SIGMOD 4.492241e-05
8,789 Machine Learning Meets Big Spatial Data 2019 VLDB 4.4509194e-05
8,968 Ontological Pathfinding: Mining First-Order Knowledge from Large Knowledge Bases 2016 SIGMOD 4.4190464e-05
9,020 Entity Matching in the Wild: A Consistent and Versatile Framework to Unify Data in Industrial Applications 2020 SIGMOD 4.4079449e-05
9,161 Automatically Generating Interesting Facts from Wikipedia Tables 2019 SIGMOD 4.3849295e-05
9,409 Ground Truth Inference for Weakly Supervised Entity Matching 2023 SIGMOD 4.3441378e-05
10,377 FastPDB: Towards Bag-Probabilistic Queries at Interactive Speeds 2025 SIGMOD 4.1945683e-05
11,179 Probabilistic Reasoning at Scale: Trigger Graphs to the Rescue 2023 SIGMOD 4.1945683e-05
11,520 Wikinegata: a Knowledge Base with Interesting Negative Statements 2021 VLDB 4.1945683e-05
11,678 Flash in Action: Scalable Spatial Data Analysis Using Markov Logic Networks 2019 VLDB 4.1945683e-05
11,718 A Demonstration of Sya: A Spatial Probabilistic Knowledge Base Construction System 2018 SIGMOD 4.1945683e-05
11,747 Holistic Query Evaluation over Information Extraction Pipelines 2018 VLDB 4.1945683e-05
11,775 Building Structured Databases of Factual Knowledge from Massive Text Corpora 2017 SIGMOD 4.1945683e-05
11,937 Mindtagger: A Demonstration of Data Labeling in Knowledge Base Construction 2015 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 13 of 13 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers