Back to papers
Correlation Sketches for Approximate Join-Correlation Queries
Summary: Introduces join-correlation queries for data augmentation: find TX joinable with TQ on KQ where a TX column C correlates to Q. Proposes correlation sketches to index tables with estimates and scoring to rank results; experiments validate accuracy.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 6261
- Venue
- SIGMOD
- Year
- 2021
- Pagerank
- 6.7260705e-05
- Overall Rank
- 3,824 | 73.40%
- DOI
-
10.1145/3448016.3458456
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 17 of 17 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 4,967 |
Leva: Boosting Machine Learning Performance with Relational Embedding Data Augmentation |
2022 |
SIGMOD |
5.7956612e-05 |
| 5,024 |
Towards Distribution-aware Query Answering in Data Markets |
2022 |
VLDB |
5.7535043e-05 |
| 5,976 |
Responsible Data Integration: Next-generation Challenges |
2022 |
SIGMOD |
5.245976e-05 |
| 6,270 |
MATE: Multi-Attribute Table Extraction |
2022 |
VLDB |
5.1337451e-05 |
| 6,449 |
Causal Data Integration |
2023 |
VLDB |
5.0587746e-05 |
| 7,732 |
Double-Anonymous Sketch: Achieving Top-K-fairness for Finding Global Top-K Frequent Items |
2023 |
SIGMOD |
4.6657123e-05 |
| 8,250 |
Stingy Sketch: A Sketch Framework for Accurate and Fast Frequency Estimation |
2022 |
VLDB |
4.5506131e-05 |
| 8,618 |
Nexus: Correlation Discovery over Collections of Spatio-Temporal Tabular Data |
2024 |
SIGMOD |
4.4838259e-05 |
| 8,696 |
Effective Entity Augmentation By Querying External Data Sources |
2023 |
VLDB |
4.4660032e-05 |
| 9,644 |
Fair and Actionable Causal Prescription Ruleset |
2025 |
SIGMOD |
4.3109001e-05 |
| 10,142 |
AutoDDG: Automated Dataset Description Generation using Large Language Models |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,628 |
CatDB: Data-catalog-guided, LLM-based Generation of Data-centric ML Pipelines |
2025 |
VLDB |
4.1945683e-05 |
| 10,836 |
Data Discovery in Data Lakes: Operations, Indexes, Systems |
2025 |
VLDB |
4.1945683e-05 |
| 11,025 |
Sampling Methods for Inner Product Sketching |
2024 |
VLDB |
4.1945683e-05 |
| 11,054 |
Enriching Relations with Additional Attributes for ER |
2024 |
VLDB |
4.1945683e-05 |
| 11,097 |
Navigating Data Repositories: Utilizing Line Charts to Discover Relevant Datasets |
2024 |
VLDB |
4.1945683e-05 |
| 11,168 |
Weighted Minwise Hashing Beats Linear Sketching for Inner Product Estimation |
2023 |
PODS |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 23 of 23 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 59 |
Sampling-Based Estimation of the Number of Distinct Values of an Attribute |
1995 |
VLDB |
0.00064501896 |
| 92 |
Practical Selectivity Estimation through Adaptive Sampling |
1990 |
SIGMOD |
0.00051315959 |
| 107 |
WebTables: Exploring the Power of Tables on the Web |
2008 |
VLDB |
0.00048377684 |
| 204 |
Learned Cardinalities: Estimating Correlated Joins with Deep Learning |
2019 |
CIDR |
0.00034784455 |
| 211 |
Join Synopses for Approximate Query Answering |
1999 |
SIGMOD |
0.00033981214 |
| 325 |
The History of Histograms (abridged) |
2003 |
VLDB |
0.00027378328 |
| 378 |
Towards Estimation Error Guarantees for Distinct Values |
2000 |
PODS |
0.0002497492 |
| 553 |
Bifocal Sampling for Skew-Resistant Join Size Estimation |
1996 |
SIGMOD |
0.00020272061 |
| 727 |
On Synopses for Distinct-Value Estimation Under Multiset Operations |
2007 |
SIGMOD |
0.00017508726 |
| 758 |
Deep Unsupervised Cardinality Estimation |
2020 |
VLDB |
0.0001706608 |
| 1,105 |
Cardinality Estimation Done Right: Index-Based Join Sampling |
2017 |
CIDR |
0.00013990395 |
| 1,178 |
Table Union Search on Open Data |
2018 |
VLDB |
0.00013468118 |
| 1,187 |
JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes |
2019 |
SIGMOD |
0.00013443639 |
| 1,193 |
Join Size Estimation Subject to Filter Conditions |
2015 |
VLDB |
0.00013414989 |
| 1,277 |
The Data Civilizer System |
2017 |
CIDR |
0.00012879695 |
| 1,463 |
ARDA: Automatic Relational Data Augmentation for Machine Learning |
2020 |
VLDB |
0.00011869295 |
| 1,644 |
Finding Related Tables in Data Lakes for Interactive Data Science |
2020 |
SIGMOD |
0.00011041787 |
| 1,683 |
Cardinality Estimation: An Experimental Survey |
2018 |
VLDB |
0.00010922679 |
| 2,045 |
Multi-Dimensional Clustering: A New Data Layout Scheme in DB2 |
2003 |
SIGMOD |
9.6939983e-05 |
| 2,141 |
LSH Ensemble: Internet-Scale Domain Search |
2016 |
VLDB |
9.4542625e-05 |
| 2,254 |
Two-Level Sampling for Join Size Estimation |
2017 |
SIGMOD |
9.1897043e-05 |
| 3,928 |
Tighter Estimation using Bottom-k Sketches |
2008 |
VLDB |
6.6254568e-05 |
| 6,493 |
Joins on Samples: A Theoretical Guide for Practitioners |
2020 |
VLDB |
5.0424713e-05 |
Semantically Similar Papers