Database Paper Browser

Back to papers

Goods: Organizing Google's Datasets

Summary: Goods crawls diverse enterprise datasets to build a scalable metadata catalog and infer relationships (similarity, provenance) across billions in a distributed landscape. Provides discovery, monitoring, annotation, and relationship-analysis services for enterprise data at scale. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
5189
Venue
SIGMOD
Year
2016
Pagerank
0.00019232674
Overall Rank
610 | 95.76%
DOI
10.1145/2882903.2903730

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 44 of 44 citing papers.

Rank Citing Paper Year Venue Pagerank
939 Data Lake Management: Challenges and Opportunities 2019 VLDB 0.00015187344
1,277 The Data Civilizer System 2017 CIDR 0.00012879695
1,420 Data Management Challenges in Production Machine Learning 2017 SIGMOD 0.00012057956
1,463 ARDA: Automatic Relational Data Augmentation for Machine Learning 2020 VLDB 0.00011869295
1,482 Automating Large-Scale Data Quality Verification 2018 VLDB 0.00011725533
1,644 Finding Related Tables in Data Lakes for Interactive Data Science 2020 SIGMOD 0.00011041787
2,269 Ground: A Data Context Service 2017 CIDR 9.147379e-05
2,359 Data Market Platforms: Trading Data Assets to Solve Data Problems 2020 VLDB 8.9607667e-05
2,456 Production Machine Learning Pipelines: Empirical Analysis and Optimization Opportunities 2021 SIGMOD 8.7733773e-05
2,730 Open Data Integration 2018 VLDB 8.2126735e-05
3,358 Organizing Data Lakes for Navigation 2020 SIGMOD 7.1784949e-05
3,467 Data Profiling – A Tutorial 2017 SIGMOD 7.069081e-05
3,473 AI Meets Database: AI4DB and DB4AI 2021 SIGMOD 7.062864e-05
3,690 Navigating the Data Lake with DATAMARAN: Automatically Extracting Structure from Log Datasets 2018 SIGMOD 6.8384476e-05
3,942 Ember: No-Code Context Enrichment via Similarity-Based Keyless Joins 2022 VLDB 6.6114622e-05
4,003 Data Platform for Machine Learning 2019 SIGMOD 6.54347e-05
4,174 Computation Reuse in Analytics Job Service at Microsoft 2018 SIGMOD 6.3856219e-05
4,607 Data Integration and Machine Learning: A Natural Synergy 2018 SIGMOD 6.0538827e-05
4,774 LIMA: Fine-grained Lineage Tracing and Reuse in Machine Learning Systems 2021 SIGMOD 5.9316087e-05
5,086 Improving Reproducibility of Data Science Pipelines through Transparent Provenance Capture 2020 VLDB 5.7078462e-05
5,529 Data-Driven Domain Discovery for Structured Datasets 2020 VLDB 5.4566641e-05
5,595 Schemas and Types for JSON Data: from Theory to Practice 2019 SIGMOD 5.4191724e-05
5,978 Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and Beyond 2021 SIGMOD 5.2453012e-05
6,330 Efficient Construction of Approximate Ad-Hoc ML models Through Materialization and Reuse 2018 VLDB 5.1077416e-05
7,029 Computational Fact Checking: A Content Management Perspective 2018 VLDB 4.8563777e-05
7,243 Data Integration and Machine Learning: A Natural Synergy 2018 VLDB 4.7913666e-05
7,643 Cross Modal Data Discovery over Structured and Unstructured Data Lakes 2023 VLDB 4.6901105e-05
7,745 Crossing the finish line faster when paddling the Data Lake with KAYAK 2017 VLDB 4.6618625e-05
7,833 Dependency-Driven Analytics: a Compass for Uncharted Data Oceans 2017 CIDR 4.6382648e-05
7,868 Solo: Data Discovery Using Natural Language Questions Via A Self-Supervised Approach 2023 SIGMOD 4.6319504e-05
8,116 LakeBench: A Benchmark for Discovering Joinable and Unionable Tables in Data Lakes 2024 VLDB 4.581507e-05
8,608 Unity Catalog: Open and Universal Governance for the Lakehouse and Beyond 2025 SIGMOD 4.4853979e-05
8,729 OneProvenance: Efficient Extraction of Dynamic Coarse-Grained Provenance From Database Query Event Logs 2023 VLDB 4.4582221e-05
9,928 Fainder: A Fast and Accurate Index for Distribution-Aware Dataset Search 2024 VLDB 4.2511622e-05
9,961 QueryArtisan: Generating Data Manipulation Codes for Ad-hoc Analysis in Data Lakes 2025 VLDB 4.2294678e-05
10,628 CatDB: Data-catalog-guided, LLM-based Generation of Data-centric ML Pipelines 2025 VLDB 4.1945683e-05
10,645 OpenForge: Probabilistic Metadata Integration 2025 VLDB 4.1945683e-05
10,895 Towards an Objective Metric for Data Value Through Relevance 2024 CIDR 4.1945683e-05
11,063 Searching Data Lakes for Nested and Joined Data 2024 VLDB 4.1945683e-05
11,316 Kyrix-J: Visual Discovery of Connected Datasets in a Data Lake 2022 CIDR 4.1945683e-05
11,379 Fast Dataset Search with Earth Mover’s Distance 2022 VLDB 4.1945683e-05
11,518 A Demonstration of RELIC: A System for REtrospective Lineage InferenCe of Data Workflows 2021 VLDB 4.1945683e-05
11,665 Ursprung: Provenance for Large-Scale Analytics Environments 2019 SIGMOD 4.1945683e-05
11,667 Peering through the Dark: An Owl's View of Inter-job Dependencies and Jobs' Impact in Shared Clusters 2019 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 9 of 9 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers