Database Paper Browser

Back to papers

Rock: Cleaning Data by Embedding ML in Logic Rules

Summary: Rock unifies ML and logic by embedding classifiers as predicates in rules for entity resolution, conflict resolution, timeliness, and imputation. Batch/incremental rule learning, error detection, and corrections from rules and ground truth. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
6780
Venue
SIGMOD
Year
2024
Pagerank
4.3430376e-05
Overall Rank
9,434 | 34.37%
DOI
10.1145/3626246.3653372

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 7 of 7 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 44 of 44 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
8 Extending the Data Base Relational Model to Capture More Meaning 1979 SIGMOD 0.0015385917
49 Consistent Query Answers in Inconsistent Databases 1999 PODS 0.00067660624
66 Spark SQL: Relational Data Processing in Spark 2015 SIGMOD 0.00061639801
192 HoloClean: Holistic Data Repairs with Probabilistic Inference 2017 VLDB 0.00035728858
221 Deep Entity Matching with Pre-Trained Language Models 2021 VLDB 0.00033121824
265 A Cost-Based Model and Effective Heuristic for Repairing Constraints by Value Modification 2005 SIGMOD 0.00029763412
300 Deep Learning for Entity Matching: A Design Space Exploration 2018 SIGMOD 0.00028441466
319 Evaluation of entity resolution approaches on real-world match problems 2010 VLDB 0.00027781866
509 On Active Learning of Record Matching Packages 2010 SIGMOD 0.00021409518
623 Improving Data Quality: Consistency and Accuracy 2007 VLDB 0.00018996374
712 Magellan: Toward Building Entity Matching Management Systems 2016 VLDB 0.00017732426
754 Distributed Representations of Tuples for Entity Resolution 2018 VLDB 0.00017117211
1,159 Towards Certain Fixes with Editing Rules and Master Data 2010 VLDB 0.00013592813
1,188 On Generating Near-Optimal Tableaux for Conditional Functional Dependencies 2008 VLDB 0.00013441729
1,337 HoloDetect: Few-Shot Learning for Error Detection 2019 SIGMOD 0.00012497164
1,894 Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning 2020 VLDB 0.0001018378
2,175 Falcon: Scaling Up Hands-Off Crowdsourced Entity Matching to Build Cloud Services 2017 SIGMOD 9.3644117e-05
2,483 Discovery of Approximate (and Exact) Denial Constraints 2020 VLDB 8.6864916e-05
2,946 BigDansing: A System for Big Data Cleansing 2015 SIGMOD 7.8372441e-05
2,958 The Role of Massively Multi-Task and Weak Supervision in Software 2.0 2019 CIDR 7.8173975e-05
2,968 Raha: A Configuration-Free Error Detection System 2019 SIGMOD 7.7985097e-05
3,299 SCODED: Statistical Constraint Oriented Data Error Detection 2020 SIGMOD 7.2546659e-05
3,711 Saga: A Platform for Continuous Construction and Serving of Knowledge At Scale 2022 SIGMOD 6.823609e-05
3,773 Cleaning Crowdsourced Labels Using Oracles for Statistical Classification 2019 VLDB 6.7758649e-05
4,127 A Statistical Perspective on Discovering Functional Dependencies in Noisy Data 2020 SIGMOD 6.4310458e-05
4,448 The Interaction between Functional Dependencies and Template Dependencies 1980 SIGMOD 6.1785017e-05
5,153 Horizon: Scalable Dependency-driven Data Cleaning 2021 VLDB 5.6607963e-05
5,557 Determining the Currency of Data 2011 PODS 5.435361e-05
5,978 Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and Beyond 2021 SIGMOD 5.2453012e-05
6,350 NADEEF: A Generalized Data Cleaning System 2013 VLDB 5.101815e-05
6,569 Domain Adaptation for Deep Entity Resolution 2022 SIGMOD 5.0065379e-05
6,690 Parallel Discrepancy Detection and Incremental Detection 2021 VLDB 4.9621556e-05
7,165 A Lightweight and Efficient Temporal Database Management System in TDSQL 2019 VLDB 4.8130937e-05
8,406 DADER: Hands-Off Entity Resolution with Domain Adaptation 2022 VLDB 4.5220083e-05
8,422 Deducing Certain Fixes to Graphs 2019 VLDB 4.5167705e-05
8,503 A Demonstration of KGLac: A Data Discovery and Enrichment Platform for Data Science 2021 VLDB 4.496339e-05
9,273 ActiveDeeper: A Model-based Active Data Enrichment System 2020 VLDB 4.3649603e-05
9,355 Discovering Top-k Rules using Subjective and Objective Criteria 2023 SIGMOD 4.3514328e-05
9,577 CoClean: Collaborative Data Cleaning 2020 SIGMOD 4.3248438e-05
9,894 OceanRT: Real-Time Analytics over Large Temporal Data 2014 SIGMOD 4.2602616e-05
9,963 Parallel Rule Discovery from Large Datasets by Sampling 2022 SIGMOD 4.2294678e-05
11,209 Enriching Recommendation Models with Logic Conditions 2023 SIGMOD 4.1945683e-05
11,223 Splitting Tuples of Mismatched Entities 2023 SIGMOD 4.1945683e-05
11,234 Learning and Deducing Temporal Orders 2023 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Semantically Similar Papers