Back to papers
Parallel Rule Discovery from Large Datasets by Sampling
Summary: Parallel rule discovery for REEs across tables via multi-round sampling with alpha precision and beta recall guarantees. Deep Q-learning selects predicates for multi-variable rules; tableau boosts recall; parallelization yields 12.2x speedups at 10% sample.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 6475
- Venue
- SIGMOD
- Year
- 2022
- Pagerank
- 4.2294678e-05
- Overall Rank
- 9,963 | 30.69%
- DOI
-
10.1145/3514221.3526165
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 11 of 11 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 9,348 |
GIDCL: A Graph-Enhanced Interpretable Data Cleaning Framework with Large Language Models |
2024 |
SIGMOD |
4.3526427e-05 |
| 9,355 |
Discovering Top-k Rules using Subjective and Objective Criteria |
2023 |
SIGMOD |
4.3514328e-05 |
| 9,434 |
Rock: Cleaning Data by Embedding ML in Logic Rules |
2024 |
SIGMOD |
4.3430376e-05 |
| 9,846 |
HyperBlocker: Accelerating Rule-based Blocking in Entity Resolution using GPUs |
2025 |
VLDB |
4.2721228e-05 |
| 9,847 |
Discovering Top-k Relevant and Diversified Rules |
2024 |
SIGMOD |
4.2721228e-05 |
| 10,029 |
Outliers: The Good, the Bad and the Ugly |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,489 |
Incremental Rule Discovery in Response to Parameter Updates |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,981 |
Enabling Adaptive Sampling for Intra-Window Join: Simultaneously Optimizing Quantity and Quality |
2024 |
SIGMOD |
4.1945683e-05 |
| 11,001 |
Capturing More Associations by Referencing External Graphs |
2024 |
VLDB |
4.1945683e-05 |
| 11,111 |
Rock: Cleaning Data with both ML and Logic Rules |
2024 |
VLDB |
4.1945683e-05 |
| 11,223 |
Splitting Tuples of Mismatched Entities |
2023 |
SIGMOD |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 20 of 20 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 49 |
Consistent Query Answers in Inconsistent Databases |
1999 |
PODS |
0.00067660624 |
| 192 |
HoloClean: Holistic Data Repairs with Probabilistic Inference |
2017 |
VLDB |
0.00035728858 |
| 221 |
Deep Entity Matching with Pre-Trained Language Models |
2021 |
VLDB |
0.00033121824 |
| 319 |
Evaluation of entity resolution approaches on real-world match problems |
2010 |
VLDB |
0.00027781866 |
| 473 |
Sampling Large Databases for Association Rules |
1996 |
VLDB |
0.0002233798 |
| 555 |
Discovering Denial Constraints |
2013 |
VLDB |
0.00020254908 |
| 894 |
A Hybrid Approach to Functional Dependency Discovery |
2016 |
SIGMOD |
0.00015556428 |
| 1,188 |
On Generating Near-Optimal Tableaux for Conditional Functional Dependencies |
2008 |
VLDB |
0.00013441729 |
| 1,337 |
HoloDetect: Few-Shot Learning for Error Detection |
2019 |
SIGMOD |
0.00012497164 |
| 1,831 |
Synthesizing Entity Matching Rules by Examples |
2018 |
VLDB |
0.00010384082 |
| 2,077 |
Efficient Discovery of Approximate Dependencies |
2018 |
VLDB |
9.6001836e-05 |
| 2,253 |
Efficient Denial Constraint Discovery with Hydra |
2018 |
VLDB |
9.1937209e-05 |
| 2,483 |
Discovery of Approximate (and Exact) Denial Constraints |
2020 |
VLDB |
8.6864916e-05 |
| 3,440 |
Approximate Denial Constraints |
2020 |
VLDB |
7.0918817e-05 |
| 4,127 |
A Statistical Perspective on Discovering Functional Dependencies in Noisy Data |
2020 |
SIGMOD |
6.4310458e-05 |
| 5,192 |
Pattern Functional Dependencies for Data Cleaning |
2020 |
VLDB |
5.6375087e-05 |
| 5,252 |
Error-bounded Sampling for Analytics on Big Sparse Data |
2014 |
VLDB |
5.6024389e-05 |
| 5,613 |
Distributed implementations of dependency discovery algorithms |
2019 |
VLDB |
5.4102298e-05 |
| 6,042 |
MDedup: Duplicate Detection with Matching Dependencies |
2020 |
VLDB |
5.2405269e-05 |
| 7,287 |
Discovering Association Rules from Big Graphs |
2022 |
VLDB |
4.7762276e-05 |
Semantically Similar Papers