Database Paper Browser

Back to papers

Web-scale Data Integration: You can only afford to Pay As You Go

Summary: Argues that traditional tightly-coupled integration fails at web scale (Deep Web, Google Base) due to extreme heterogeneity and scale. Proposes PAYGO, a dataspaces-inspired pay-as-you-go architecture that delivers incremental, best-effort, cost-aware integration to maximize utility under limited resources. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID
94
Venue
CIDR
Year
2007
Pagerank
0.00013677658
Overall Rank
1,147 | 92.03%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 28 of 28 citing papers.

Rank Citing Paper Year Venue Pagerank
257 Making Database Systems Usable 2007 SIGMOD 0.00030223397
627 Management of Probabilistic Data: Foundations and Challenges 2007 PODS 0.00018959005
667 Incremental Knowledge Base Construction Using DeepDive 2015 VLDB 0.00018440557
692 Pay-as-you-go User Feedback for Dataspace Systems 2008 SIGMOD 0.00018083948
1,537 Google's Deep-Web Crawl 2008 VLDB 0.00011465704
2,012 DB&IR: Both Sides Now (Extended Abstract) 2007 SIGMOD 9.7951657e-05
2,771 A Relational Approach to Incrementally Extracting and Querying Structure in Unstructured Data 2007 VLDB 8.1421432e-05
3,977 BLAST: a Loosely Schema-aware Meta-blocking Approach for Entity Resolution 2016 VLDB 6.5736268e-05
3,995 How Large Language Models Will Disrupt Data Management 2023 VLDB 6.5513237e-05
4,185 Arnold: Declarative Crowd-Machine Data Integration 2013 CIDR 6.3776356e-05
4,229 Harnessing the Deep Web: Present and Future 2009 CIDR 6.3399547e-05
4,508 iTrails: Pay-as-you-go Information Integration in Dataspaces 2007 VLDB 6.1298098e-05
5,228 Schema-agnostic vs Schema-based Configurations for Blocking Methods on Homogeneous Data 2016 VLDB 5.6158315e-05
5,571 HAMSTER: Using Search Clicklogs for Schema and Taxonomy Matching 2009 VLDB 5.4283499e-05
5,774 A Hierarchical Approach to Model Web Query Interfaces for Web Source Integration 2009 VLDB 5.3313642e-05
7,681 SXPath - Extending XPath towards Spatial Querying on Web Documents 2011 VLDB 4.6804276e-05
8,007 A Grammar-based Entity Representation Framework for Data Cleaning 2009 SIGMOD 4.6068018e-05
8,008 Entity Resolution On-Demand 2022 VLDB 4.6067684e-05
8,554 Search Driven Analysis of Heterogeneous XML Data 2009 CIDR 4.4937074e-05
8,696 Effective Entity Augmentation By Querying External Data Sources 2023 VLDB 4.4660032e-05
8,823 The Role of Schema Matching in Large Enterprises 2009 CIDR 4.4415658e-05
8,878 Learning to Extract Form Labels 2008 VLDB 4.4302126e-05
9,943 Stop Word and Related Problems in Web Interface Integration 2009 VLDB 4.2456408e-05
12,113 Mob Data Sourcing 2012 SIGMOD 4.1945683e-05
12,184 Pay-As-You-Go Mapping Selection in Dataspaces 2011 SIGMOD 4.1945683e-05
12,223 Schema Clustering and Retrieval for Multi-domain Pay-As-You-Go Data Integration Systems 2010 SIGMOD 4.1945683e-05
12,240 Creating and Exploring Web Form Repositories 2010 SIGMOD 4.1945683e-05
12,320 Vispedia: On-demand Data Integration for Interactive Visualization and Exploration 2009 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 14 of 14 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers