Database Paper Browser

Back to papers

Correlation Sketches for Approximate Join-Correlation Queries

Summary: Introduces join-correlation queries for data augmentation: find TX joinable with TQ on KQ where a TX column C correlates to Q. Proposes correlation sketches to index tables with estimates and scoring to rank results; experiments validate accuracy. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
6261
Venue
SIGMOD
Year
2021
Pagerank
6.7260705e-05
Overall Rank
3,824 | 73.40%
DOI
10.1145/3448016.3458456

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 17 of 17 citing papers.

Rank Citing Paper Year Venue Pagerank
4,967 Leva: Boosting Machine Learning Performance with Relational Embedding Data Augmentation 2022 SIGMOD 5.7956612e-05
5,024 Towards Distribution-aware Query Answering in Data Markets 2022 VLDB 5.7535043e-05
5,976 Responsible Data Integration: Next-generation Challenges 2022 SIGMOD 5.245976e-05
6,270 MATE: Multi-Attribute Table Extraction 2022 VLDB 5.1337451e-05
6,449 Causal Data Integration 2023 VLDB 5.0587746e-05
7,732 Double-Anonymous Sketch: Achieving Top-K-fairness for Finding Global Top-K Frequent Items 2023 SIGMOD 4.6657123e-05
8,250 Stingy Sketch: A Sketch Framework for Accurate and Fast Frequency Estimation 2022 VLDB 4.5506131e-05
8,618 Nexus: Correlation Discovery over Collections of Spatio-Temporal Tabular Data 2024 SIGMOD 4.4838259e-05
8,696 Effective Entity Augmentation By Querying External Data Sources 2023 VLDB 4.4660032e-05
9,644 Fair and Actionable Causal Prescription Ruleset 2025 SIGMOD 4.3109001e-05
10,142 AutoDDG: Automated Dataset Description Generation using Large Language Models 2026 SIGMOD 4.1945683e-05
10,628 CatDB: Data-catalog-guided, LLM-based Generation of Data-centric ML Pipelines 2025 VLDB 4.1945683e-05
10,836 Data Discovery in Data Lakes: Operations, Indexes, Systems 2025 VLDB 4.1945683e-05
11,025 Sampling Methods for Inner Product Sketching 2024 VLDB 4.1945683e-05
11,054 Enriching Relations with Additional Attributes for ER 2024 VLDB 4.1945683e-05
11,097 Navigating Data Repositories: Utilizing Line Charts to Discover Relevant Datasets 2024 VLDB 4.1945683e-05
11,168 Weighted Minwise Hashing Beats Linear Sketching for Inner Product Estimation 2023 PODS 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 23 of 23 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
59 Sampling-Based Estimation of the Number of Distinct Values of an Attribute 1995 VLDB 0.00064501896
92 Practical Selectivity Estimation through Adaptive Sampling 1990 SIGMOD 0.00051315959
107 WebTables: Exploring the Power of Tables on the Web 2008 VLDB 0.00048377684
204 Learned Cardinalities: Estimating Correlated Joins with Deep Learning 2019 CIDR 0.00034784455
211 Join Synopses for Approximate Query Answering 1999 SIGMOD 0.00033981214
325 The History of Histograms (abridged) 2003 VLDB 0.00027378328
378 Towards Estimation Error Guarantees for Distinct Values 2000 PODS 0.0002497492
553 Bifocal Sampling for Skew-Resistant Join Size Estimation 1996 SIGMOD 0.00020272061
727 On Synopses for Distinct-Value Estimation Under Multiset Operations 2007 SIGMOD 0.00017508726
758 Deep Unsupervised Cardinality Estimation 2020 VLDB 0.0001706608
1,105 Cardinality Estimation Done Right: Index-Based Join Sampling 2017 CIDR 0.00013990395
1,178 Table Union Search on Open Data 2018 VLDB 0.00013468118
1,187 JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes 2019 SIGMOD 0.00013443639
1,193 Join Size Estimation Subject to Filter Conditions 2015 VLDB 0.00013414989
1,277 The Data Civilizer System 2017 CIDR 0.00012879695
1,463 ARDA: Automatic Relational Data Augmentation for Machine Learning 2020 VLDB 0.00011869295
1,644 Finding Related Tables in Data Lakes for Interactive Data Science 2020 SIGMOD 0.00011041787
1,683 Cardinality Estimation: An Experimental Survey 2018 VLDB 0.00010922679
2,045 Multi-Dimensional Clustering: A New Data Layout Scheme in DB2 2003 SIGMOD 9.6939983e-05
2,141 LSH Ensemble: Internet-Scale Domain Search 2016 VLDB 9.4542625e-05
2,254 Two-Level Sampling for Join Size Estimation 2017 SIGMOD 9.1897043e-05
3,928 Tighter Estimation using Bottom-k Sketches 2008 VLDB 6.6254568e-05
6,493 Joins on Samples: A Theoretical Guide for Practitioners 2020 VLDB 5.0424713e-05
Previous Page 1 / 1 Next

Semantically Similar Papers