Database Paper Browser

Back to papers

Data Integration for the Relational Web

Summary: Octopus unifies search, extraction, cleaning, and integration for Web-sourced relational data. It provides best-effort operators (Search, Context, Extend) with automation and user feedback to discover and join heterogeneous sources during integration. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
9912
Venue
VLDB
Year
2009
Pagerank
0.00021158934
Overall Rank
518 | 96.40%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 40 of 40 citing papers.

Rank Citing Paper Year Venue Pagerank
420 InfoGather: Entity Augmentation and Attribute Discovery By Holistic Matching with Web Tables 2012 SIGMOD 0.00023719065
818 Finding Related Tables 2012 SIGMOD 0.00016311524
939 Data Lake Management: Challenges and Opportunities 2019 VLDB 0.00015187344
1,001 Recovering Semantics of Tables on the Web 2011 VLDB 0.00014706505
1,178 Table Union Search on Open Data 2018 VLDB 0.00013468118
1,277 The Data Civilizer System 2017 CIDR 0.00012879695
1,367 Answering Table Queries on the Web using Column Keywords 2012 VLDB 0.00012349783
1,463 ARDA: Automatic Relational Data Augmentation for Machine Learning 2020 VLDB 0.00011869295
2,078 Sample-Driven Schema Mapping 2012 SIGMOD 9.599707e-05
2,617 Extraction and Integration of Partially Overlapping Web Sources 2013 VLDB 8.4462621e-05
2,633 Schema Extraction for Tabular Data on the Web 2013 VLDB 8.4063569e-05
2,730 Open Data Integration 2018 VLDB 8.2126735e-05
2,836 Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning 2023 VLDB 8.0443826e-05
3,000 SANTOS: Relationship-based Semantic Table Union Search 2023 SIGMOD 7.7462128e-05
3,155 Ten Years of WebTables 2018 VLDB 7.4672742e-05
3,229 InfoGather+: Semantic Matching and Annotation of Numeric and Time-Varying Attributes in Web Tables 2013 SIGMOD 7.3393682e-05
3,358 Organizing Data Lakes for Navigation 2020 SIGMOD 7.1784949e-05
3,742 TEGRA: Table Extraction by Global Record Alignment 2015 SIGMOD 6.7966898e-05
3,797 Stitching Web Tables for Improving Matching Quality 2017 VLDB 6.7597149e-05
3,942 Ember: No-Code Context Enrichment via Similarity-Based Keyless Joins 2022 VLDB 6.6114622e-05
3,963 Pytheas: Pattern-based Table Discovery in CSV Files 2020 VLDB 6.5840643e-05
3,995 How Large Language Models Will Disrupt Data Management 2023 VLDB 6.5513237e-05
4,173 Automatic Example Queries for Ad Hoc Databases 2011 SIGMOD 6.3874627e-05
4,850 SEMA-JOIN: Joining Semantically-Related Tables Using Big Table Corpora 2015 VLDB 5.8768452e-05
4,859 Integrating Data Lake Tables 2023 VLDB 5.8732433e-05
5,024 Towards Distribution-aware Query Answering in Data Markets 2022 VLDB 5.7535043e-05
5,058 A Demo of the Data Civilizer System 2017 SIGMOD 5.7280139e-05
5,652 From Information to Knowledge: Harvesting Entities and Relationships from Web Sources 2010 PODS 5.3903671e-05
5,937 DataXFormer: Leveraging the Web for Semantic Transformations 2015 CIDR 5.2650964e-05
6,270 MATE: Multi-Attribute Table Extraction 2022 VLDB 5.1337451e-05
7,048 Magneto: Combining Small and Large Language Models for Schema Matching 2025 VLDB 4.8520651e-05
7,919 DEXTER: Large-Scale Discovery and Extraction of Product Specifications on the Web 2015 VLDB 4.616746e-05
8,116 LakeBench: A Benchmark for Discovering Joinable and Unionable Tables in Data Lakes 2024 VLDB 4.581507e-05
8,678 Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment 2019 SIGMOD 4.4702119e-05
8,729 OneProvenance: Efficient Extraction of Dynamic Coarse-Grained Provenance From Database Query Event Logs 2023 VLDB 4.4582221e-05
10,951 Determining the Largest Overlap between Tables 2024 SIGMOD 4.1945683e-05
11,712 GeoFlux: Hands-Off Data Integration Leveraging Join Key Knowledge 2018 SIGMOD 4.1945683e-05
11,722 Deeper: A Data Enrichment System Powered by Deep Web 2018 SIGMOD 4.1945683e-05
11,895 Finding Quality in Quantity: The Challenge of Discovering Valuable Sources for Integration 2015 CIDR 4.1945683e-05
12,201 AIDA: An Online Tool for Accurate Disambiguation of Named Entities in Text and Tables 2011 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 8 of 8 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers