When the Web is your Data Lake: Creating a Search Engine for Datasets on the Web
Summary: Dataset Search enables discovery of datasets across the Web, spanning government and research data providers. It proposes an open metadata/citation ecosystem and outlines a scalable, heterogeneous search architecture that treats data as a first-class citizen. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Natasha Noy
Incoming Citations (Sorted by Pagerank)
Showing 3 of 3 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 7,643 | Cross Modal Data Discovery over Structured and Unstructured Data Lakes | 2023 | VLDB | 4.6901105e-05 |
| 9,961 | QueryArtisan: Generating Data Manipulation Codes for Ad-hoc Analysis in Data Lakes | 2025 | VLDB | 4.2294678e-05 |
| 10,797 | A Demonstration of QueryArtisan: Real-Time Data Lake Analysis via Dynamically Generated Data Manipulation Code | 2025 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 0 of 0 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 9,139 | DataSift: A Crowd-Powered Search Toolkit | 2014 | SIGMOD | 4.3866006e-05 |
| 13,486 | Managing Scientific Data: Lessons, Challenges, and Opportunities | 2011 | SIGMOD | - |
| 2,730 | Open Data Integration | 2018 | VLDB | 8.2126735e-05 |
| 11,910 | Demonstrating "Data Near Here": Scientific Data Search | 2015 | SIGMOD | 4.1945683e-05 |
| 13,468 | Data Management on the Spatial Web | 2012 | VLDB | - |
| 13,531 | Voyagers and Voyeurs: Supporting Social Data Analysis | 2009 | SIGMOD | - |
| 13,143 | Bridging Disciplines in Data Management Research to Solve Complex Data Problems | 2025 | VLDB | - |
| 12,401 | Large-Scale Collaborative Analysis and Extraction of Web Data | 2008 | VLDB | 4.1945683e-05 |
| 13,277 | The Challenge of Building Effective Data Lakes | 2020 | SIGMOD | - |
| 10,439 | Finding What You’re Looking For: A Distribution-Aware Dataset Search Engine in Action | 2025 | SIGMOD | 4.1945683e-05 |