Mining an "Anti-Knowledge Base" from Wikipedia Updates with Applications to Fact Checking and Beyond
Summary: Unsupervised mining builds an anti-knowledge base of factual mistakes from Wikipedia updates, focusing on long-tail errors for fact-checking benchmarks. A multi-step pipeline—heuristics, cross-web corroboration, EM inference, and SVO extraction—produces 110k+ ranked mistakes with 85% precision in the top 1%, enabling web-wide error discovery and analysis. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Georgios Karagiannis
- 2. Immanuel Trummer
- 3. Saehan Jo
- 4. Shubham Khandelwal
- 5. Xuezhi Wang
- 6. Cong Yu
Incoming Citations (Sorted by Pagerank)
Showing 5 of 5 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 10,436 | Demonstrating CEDAR: A System for Cost-Efficient Data-Driven Claim Verification | 2025 | SIGMOD | 4.1945683e-05 |
| 10,747 | CEDAR: A System for Cost-Efficient Data-Driven Claim Verification | 2025 | VLDB | 4.1945683e-05 |
| 11,465 | To Intervene or Not To Intervene: Cost based Intervention for Combating Fake News | 2021 | SIGMOD | 4.1945683e-05 |
| 11,520 | Wikinegata: a Knowledge Base with Interesting Negative Statements | 2021 | VLDB | 4.1945683e-05 |
| 11,534 | On the Limits of Machine Knowledge: Completeness, Recall and Negation in Web-scale Knowledge Bases | 2021 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 9 of 9 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 192 | HoloClean: Holistic Data Repairs with Probabilistic Inference | 2017 | VLDB | 0.00035728858 |
| 254 | Snorkel: Rapid Training Data Creation with Weak Supervision | 2018 | VLDB | 0.00030540555 |
| 287 | Declarative Information Extraction Using Datalog with Embedded Extraction Predicates | 2007 | VLDB | 0.00028971272 |
| 2,509 | ClaimBuster: The First-ever End-to-end Fact-checking System | 2017 | VLDB | 8.6260643e-05 |
| 2,567 | Resolving Conflicts in Heterogeneous Data by Truth Discovery and Source Reliability Estimation | 2014 | SIGMOD | 8.5239306e-05 |
| 2,937 | Truth Inference in Crowdsourcing: Is the Problem Solved? | 2017 | VLDB | 7.853108e-05 |
| 4,972 | Verifying Text Summaries of Relational Data Sets | 2019 | SIGMOD | 5.7931494e-05 |
| 6,780 | Domain-Aware Multi-Truth Discovery from Conflicting Sources | 2018 | VLDB | 4.9277708e-05 |
| 7,012 | Mining Subjective Properties on the Web | 2015 | SIGMOD | 4.8626409e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 11,520 | Wikinegata: a Knowledge Base with Interesting Negative Statements | 2021 | VLDB | 4.1945683e-05 |
| 7,029 | Computational Fact Checking: A Content Management Perspective | 2018 | VLDB | 4.8563777e-05 |
| 9,054 | Selecting Data to Clean for Fact Checking: Minimizing Uncertainty vs. Maximizing Surprise | 2019 | VLDB | 4.4039656e-05 |
| 9,137 | Combating Fake News: A Data Management and Mining Perspective | 2019 | VLDB | 4.3881065e-05 |
| 2,506 | Auto-Detect: Data-Driven Error Detection in Tables | 2018 | SIGMOD | 8.6335464e-05 |
| 11,775 | Building Structured Databases of Factual Knowledge from Massive Text Corpora | 2017 | SIGMOD | 4.1945683e-05 |
| 9,161 | Automatically Generating Interesting Facts from Wikipedia Tables | 2019 | SIGMOD | 4.3849295e-05 |
| 3,495 | Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources | 2015 | VLDB | 7.0400666e-05 |
| 3,340 | Toward Computational Fact-Checking | 2014 | VLDB | 7.2030091e-05 |
| 7,648 | User Guidance for Efficient Fact Checking | 2019 | VLDB | 4.6889787e-05 |