Database Paper Browser

Back to papers

On Random Sampling over Joins

Summary: Random sampling over joins: feasibility of sampling join outputs without full evaluation; theoretical limits on efficiency. Proposes new join-sampling algorithms for settings where limits don't apply; empirical evaluation on SQL Server 7.0 shows efficiency gains. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
3105
Venue
SIGMOD
Year
1999
Pagerank
0.00092385438
Overall Rank
18 | 99.88%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 50 of 86 citing papers.

Rank Citing Paper Year Venue Pagerank
3 Pig Latin: A Not-So-Foreign Language for Data Processing 2008 SIGMOD 0.0024183614
43 Models and Issues in Data Stream Systems 2002 PODS 0.00072723062
194 Query Processing, Resource Management, and Approximation in a Data Stream Management System 2003 CIDR 0.00035426067
266 Efficient Exact Set-Similarity Joins 2006 VLDB 0.00029718727
342 EmptyHeaded: A Relational Engine for Graph Processing 2016 SIGMOD 0.00026795977
449 Approximate Query Processing: Taming the TeraBytes! A Tutorial 2001 VLDB 0.00022846068
739 Congressional Samples for Approximate Answering of Group-By Queries 2000 SIGMOD 0.00017401518
758 Deep Unsupervised Cardinality Estimation 2020 VLDB 0.0001706608
943 Wander Join: Online Aggregation via Random Walks 2016 SIGMOD 0.00015145883
1,017 Automatic Physical Database Tuning: A Relaxation-based Approach 2005 SIGMOD 0.00014634307
1,204 VerdictDB: Universalizing Approximate Query Processing 2018 SIGMOD 0.00013319541
1,260 Dynamic Sample Selection for Approximate Query Processing 2003 SIGMOD 0.00012993347
1,262 RankSQL: Query Algebra and Optimization for Relational Top-k Queries 2005 SIGMOD 0.00012986539
1,272 Proactive Re-Optimization 2005 SIGMOD 0.00012920076
1,323 Quickr: Lazily Approximating Complex AdHoc Queries in BigData Clusters 2016 SIGMOD 0.00012601997
1,335 ICICLES: Self-tuning Samples for Approximate Query Answering 2000 VLDB 0.00012502131
1,369 Random Sampling over Joins Revisited 2018 SIGMOD 0.00012339777
1,425 Scalable Approximate Query Processing With The DBO Engine 2007 SIGMOD 0.00012051353
1,574 Approximate Query Processing: No Silver Bullet 2017 SIGMOD 0.00011287495
1,717 Approximate Join Processing Over Data Streams 2003 SIGMOD 0.00010793312
1,909 SciBORQ: Scientific data management with Bounds On Runtime and Quality 2011 CIDR 0.00010121304
2,035 Generating Example Data for Dataflow Programs 2009 SIGMOD 9.7149269e-05
2,111 When Can We Trust Progress Estimators for SQL Queries? 2005 SIGMOD 9.5286436e-05
2,165 Self-Tuning, GPU-Accelerated Kernel Density Models for Multidimensional Selectivity Estimation 2015 SIGMOD 9.389622e-05
2,282 Summarizing and Mining Inverse Distributions on Data Streams via Dynamic Inverse Sampling 2005 VLDB 9.1073603e-05
2,501 DBEst: Revisiting Approximate Query Processing Engines with Machine Learning Models 2019 SIGMOD 8.6453446e-05
2,789 Optimal Sampling from Sliding Windows 2009 PODS 8.1249652e-05
2,808 A Robust, Optimization-Based Approach for Approximate Answering of Aggregate Queries 2001 SIGMOD 8.0870741e-05
2,969 Estimating Join Selectivities using Bandwidth-Optimized Kernel Density Models 2017 VLDB 7.7974762e-05
2,995 A Sampling Algebra for Aggregate Estimation 2013 VLDB 7.7587199e-05
3,013 Cardinality Estimation Using Sample Views with Quality Assurance 2007 SIGMOD 7.7137441e-05
3,387 Answering (Unions of) Conjunctive Queries using Random Access and Random-Order Enumeration 2020 PODS 7.1573735e-05
3,408 Query Optimizers: Time to Rethink the Contract? 2009 SIGMOD 7.1288167e-05
3,566 Fast Manhattan Sketches in Data Streams 2010 PODS 6.9629443e-05
3,593 Graph-Based Synopses for Relational Selectivity Estimation 2006 SIGMOD 6.9385476e-05
3,954 Efficiently Approximating Selectivity Functions using Low Overhead Regression Models 2020 VLDB 6.5926838e-05
4,029 Spatial Online Sampling and Aggregation 2016 VLDB 6.51315e-05
4,100 A Bi-Level Bernoulli Scheme for Database Sampling 2004 SIGMOD 6.4531387e-05
4,133 Memory-Limited Execution of Windowed Stream Joins 2004 VLDB 6.4196026e-05
4,245 A Disk-Based Join With Probabilistic Guarantees* 2005 SIGMOD 6.3272687e-05
4,435 Sampling Dirty Data for Matching Attributes 2010 SIGMOD 6.1918164e-05
4,953 On Join Sampling and the Hardness of Combinatorial Output-Sensitive Join Algorithms 2023 PODS 5.8085795e-05
4,955 Estimating arbitrary subset sums with few probes 2005 PODS 5.8053317e-05
5,050 xPAD: A Platform for Analytic Data Flows 2013 SIGMOD 5.7340229e-05
5,150 Efficient Join Synopsis Maintenance for Data Warehouse 2020 SIGMOD 5.6626586e-05
5,220 Similarity Join Size Estimation using Locality Sensitive Hashing 2011 VLDB 5.6216111e-05
5,401 ALECE: An Attention-based Learned Cardinality Estimator for SPJ Queries on Dynamic Workloads 2024 VLDB 5.5285035e-05
5,511 On Producing Join Results Early 2003 PODS 5.4699346e-05
5,539 Supporting Time-Constrained SQL Queries in Oracle 2007 VLDB 5.4503121e-05
5,644 FluxQuery: An Execution Framework for Highly Interactive Query Workloads 2016 SIGMOD 5.3924275e-05
Previous Page 1 / 2 Next

Outgoing Citations (Sorted by Pagerank)

Showing 6 of 6 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers

Overall Rank Paper Year Venue Pagerank
1,255 Fixed-Precision Estimation of Join Selectivity 1993 PODS 0.00013024064
4,953 On Join Sampling and the Hardness of Combinatorial Output-Sensitive Join Algorithms 2023 PODS 5.8085795e-05
8,470 Sampling Big Ideas in Query Optimization 2023 PODS 4.5038423e-05
4,694 Scalable Reservoir Sampling on Many-Core CPUs 2019 SIGMOD 5.9944898e-05
2,254 Two-Level Sampling for Join Size Estimation 2017 SIGMOD 9.1897043e-05
92 Practical Selectivity Estimation through Adaptive Sampling 1990 SIGMOD 0.00051315959
6,493 Joins on Samples: A Theoretical Guide for Practitioners 2020 VLDB 5.0424713e-05
8,959 Reservoir Sampling over Joins 2024 SIGMOD 4.4206222e-05
1,369 Random Sampling over Joins Revisited 2018 SIGMOD 0.00012339777
46 Simple Random Sampling from Relational Databases 1986 VLDB 0.00070894702