Database Paper Browser

Back to papers

InfoGather: Entity Augmentation and Attribute Discovery By Holistic Matching with Web Tables

Summary: InfoGather augments entities and discovers attributes from web tables via holistic matching, enabling entity augmentation by name or example and attribute discovery with high precision. Topic-sensitive PageRank enables indirect table matching and multi-table aggregation; MapReduce preprocessing delivers near-interactive latency and four orders of magnitude speedup on 573M web tables. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
4515
Venue
SIGMOD
Year
2012
Pagerank
0.00023719065
Overall Rank
420 | 97.09%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 48 of 48 citing papers.

Rank Citing Paper Year Venue Pagerank
513 TURL: Table Understanding through Representation Learning 2021 VLDB 0.00021288342
610 Goods: Organizing Google's Datasets 2016 SIGMOD 0.00019232674
939 Data Lake Management: Challenges and Opportunities 2019 VLDB 0.00015187344
1,178 Table Union Search on Open Data 2018 VLDB 0.00013468118
1,277 The Data Civilizer System 2017 CIDR 0.00012879695
1,463 ARDA: Automatic Relational Data Augmentation for Machine Learning 2020 VLDB 0.00011869295
2,141 LSH Ensemble: Internet-Scale Domain Search 2016 VLDB 9.4542625e-05
2,359 Data Market Platforms: Trading Data Assets to Solve Data Problems 2020 VLDB 8.9607667e-05
2,587 Table-GPT: Table Fine-tuned GPT for Diverse Table Tasks 2024 SIGMOD 8.4924618e-05
2,633 Schema Extraction for Tabular Data on the Web 2013 VLDB 8.4063569e-05
2,730 Open Data Integration 2018 VLDB 8.2126735e-05
2,836 Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning 2023 VLDB 8.0443826e-05
3,155 Ten Years of WebTables 2018 VLDB 7.4672742e-05
3,229 InfoGather+: Semantic Matching and Annotation of Numeric and Time-Varying Attributes in Web Tables 2013 SIGMOD 7.3393682e-05
3,288 Biperpedia: An Ontology for Search Applications 2014 VLDB 7.273034e-05
3,473 AI Meets Database: AI4DB and DB4AI 2021 SIGMOD 7.062864e-05
3,478 Transform-Data-by-Example (TDE): An Extensible Search Engine for Data Transformations 2018 VLDB 7.054159e-05
3,690 Navigating the Data Lake with DATAMARAN: Automatically Extracting Structure from Log Datasets 2018 SIGMOD 6.8384476e-05
3,742 TEGRA: Table Extraction by Global Record Alignment 2015 SIGMOD 6.7966898e-05
3,797 Stitching Web Tables for Improving Matching Quality 2017 VLDB 6.7597149e-05
3,963 Pytheas: Pattern-based Table Discovery in CSV Files 2020 VLDB 6.5840643e-05
4,695 DataXFormer: An Interactive Data Transformation Tool 2015 SIGMOD 5.9927993e-05
4,838 Finding Patterns in a Knowledge Base using Keywords to Compose Table Answers 2014 VLDB 5.8887949e-05
4,850 SEMA-JOIN: Joining Semantically-Related Tables Using Big Table Corpora 2015 VLDB 5.8768452e-05
5,529 Data-Driven Domain Discovery for Structured Datasets 2020 VLDB 5.4566641e-05
5,937 DataXFormer: Leveraging the Web for Semantic Transformations 2015 CIDR 5.2650964e-05
6,092 Observatory: Characterizing Embeddings of Relational Tables 2024 VLDB 5.2138566e-05
6,237 New Trends on Exploratory Methods for Data Analytics 2017 VLDB 5.1435341e-05
6,270 MATE: Multi-Attribute Table Extraction 2022 VLDB 5.1337451e-05
7,582 LakeCompass: An End-to-End System for Data Maintenance, Search and Analysis in Data Lakes 2024 VLDB 4.7046388e-05
7,588 Scalable Column Concept Determination for Web Tables Using Large Knowledge Bases 2013 VLDB 4.7030914e-05
8,116 LakeBench: A Benchmark for Discovering Joinable and Unionable Tables in Data Lakes 2024 VLDB 4.581507e-05
8,135 Applying WebTables in Practice 2015 CIDR 4.5777549e-05
8,344 Exploring the Data Wilderness through Examples 2019 SIGMOD 4.5428111e-05
8,499 Synthesizing Mapping Relationships Using Table Corpus 2017 SIGMOD 4.4975851e-05
8,678 Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment 2019 SIGMOD 4.4702119e-05
8,696 Effective Entity Augmentation By Querying External Data Sources 2023 VLDB 4.4660032e-05
9,273 ActiveDeeper: A Model-based Active Data Enrichment System 2020 VLDB 4.3649603e-05
10,685 LakeVisage: Towards Scalable, Flexible and Interactive Visualization Recommendation for Data Discovery over Data Lakes 2025 VLDB 4.1945683e-05
10,753 Cents: A Flexible and Cost-Effective Framework for LLM-Based Table Understanding 2025 VLDB 4.1945683e-05
10,754 OmniMatch: Joinability Discovery in Data Products 2025 VLDB 4.1945683e-05
10,836 Data Discovery in Data Lakes: Operations, Indexes, Systems 2025 VLDB 4.1945683e-05
11,547 CAFE: Constraint-Aware Feature Extraction from Large Databases 2020 CIDR 4.1945683e-05
11,722 Deeper: A Data Enrichment System Powered by Deep Web 2018 SIGMOD 4.1945683e-05
11,775 Building Structured Databases of Factual Knowledge from Massive Text Corpora 2017 SIGMOD 4.1945683e-05
11,847 Automatic Entity Recognition and Typing in Massive Text Data 2016 SIGMOD 4.1945683e-05
11,895 Finding Quality in Quantity: The Challenge of Discovering Valuable Sources for Integration 2015 CIDR 4.1945683e-05
11,971 Mining Latent Entity Structures from Massive Unstructured and Interconnected Data 2014 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 11 of 11 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers