Database Paper Browser

Back to papers

Splitting Tuples of Mismatched Entities

Summary: Inverse ER: split tuples that merge distinct entities into separate records. Proposes a rule-based tuple-splitting scheme with ML for attribute correlation and missing-value imputation, using cross-relational alignment and knowledge-graph data to reach ~0.92 F-measure. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
6770
Venue
SIGMOD
Year
2023
Pagerank
4.1945683e-05
Overall Rank
11,223 | 21.93%
DOI
10.1145/3626763

Incoming Non-self Citations Over Time

No non-self incoming citations found for this paper in this database.

Authors

Incoming Citations (Sorted by Pagerank)

Showing 4 of 4 citing papers.

Rank Citing Paper Year Venue Pagerank
9,434 Rock: Cleaning Data by Embedding ML in Logic Rules 2024 SIGMOD 4.3430376e-05
10,489 Incremental Rule Discovery in Response to Parameter Updates 2025 SIGMOD 4.1945683e-05
11,054 Enriching Relations with Additional Attributes for ER 2024 VLDB 4.1945683e-05
11,111 Rock: Cleaning Data with both ML and Logic Rules 2024 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 39 of 39 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
49 Consistent Query Answers in Inconsistent Databases 1999 PODS 0.00067660624
62 Freebase: A Collaboratively Created Graph Database For Structuring Human Knowledge 2008 SIGMOD 0.0006429466
192 HoloClean: Holistic Data Repairs with Probabilistic Inference 2017 VLDB 0.00035728858
221 Deep Entity Matching with Pre-Trained Language Models 2021 VLDB 0.00033121824
300 Deep Learning for Entity Matching: A Design Space Exploration 2018 SIGMOD 0.00028441466
319 Evaluation of entity resolution approaches on real-world match problems 2010 VLDB 0.00027781866
509 On Active Learning of Record Matching Packages 2010 SIGMOD 0.00021409518
623 Improving Data Quality: Consistency and Accuracy 2007 VLDB 0.00018996374
643 Corleone: Hands-Off Crowdsourcing for Entity Matching 2014 SIGMOD 0.00018754451
754 Distributed Representations of Tuples for Entity Resolution 2018 VLDB 0.00017117211
881 Don’t be SCAREd: Use SCalable Automatic REpairing with Maximal Likelihood and Bounded Changes 2013 SIGMOD 0.00015661103
1,159 Towards Certain Fixes with Editing Rules and Master Data 2010 VLDB 0.00013592813
1,197 The LLUNATIC Data-Cleaning Framework 2013 VLDB 0.00013390321
1,831 Synthesizing Entity Matching Rules by Examples 2018 VLDB 0.00010384082
1,894 Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning 2020 VLDB 0.0001018378
2,175 Falcon: Scaling Up Hands-Off Crowdsourced Entity Matching to Build Cloud Services 2017 SIGMOD 9.3644117e-05
2,276 Mind the Gap: An Experimental Evaluation of Imputation of Missing Values Techniques in Time Series 2020 VLDB 9.1261944e-05
2,767 A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching 2020 SIGMOD 8.1513883e-05
2,968 Raha: A Configuration-Free Error Detection System 2019 SIGMOD 7.7985097e-05
3,140 ZeroER: Entity Resolution using Zero Labeled Examples 2020 SIGMOD 7.4841763e-05
3,311 Efficient and Effective Data Imputation with Influence Functions 2022 VLDB 7.2406486e-05
3,640 Deep Learning for Blocking in Entity Matching: A Design Space Exploration 2021 VLDB 6.8891671e-05
4,273 Cleaning Denial Constraint Violations through Relaxation 2020 SIGMOD 6.3003864e-05
4,332 Missing Value Imputation on Multidimensional Time Series 2021 VLDB 6.2805243e-05
4,448 The Interaction between Functional Dependencies and Template Dependencies 1980 SIGMOD 6.1785017e-05
5,153 Horizon: Scalable Dependency-driven Data Cleaning 2021 VLDB 5.6607963e-05
5,192 Pattern Functional Dependencies for Data Cleaning 2020 VLDB 5.6375087e-05
6,042 MDedup: Duplicate Detection with Matching Dependencies 2020 VLDB 5.2405269e-05
6,727 ORBITS: Online Recovery of Missing Values in Multiple Time Series Streams 2021 VLDB 4.9483604e-05
6,810 Record Linkage with Uniqueness Constraints and Erroneous Values 2010 VLDB 4.9203397e-05
7,066 On Multiple Semantics for Declarative Database Repairs 2020 SIGMOD 4.8445108e-05
8,005 Online Topic-Aware Entity Resolution Over Incomplete Data Streams 2021 SIGMOD 4.6081461e-05
8,138 Fast and Reliable Missing Data Contingency Analysis with Predicate-Constraints 2020 SIGMOD 4.5771031e-05
8,422 Deducing Certain Fixes to Graphs 2019 VLDB 4.5167705e-05
8,875 CerFix: A System for Cleaning Data with Certain Fixes 2011 VLDB 4.430475e-05
9,020 Entity Matching in the Wild: A Consistent and Versatile Framework to Unify Data in Industrial Applications 2020 SIGMOD 4.4079449e-05
9,355 Discovering Top-k Rules using Subjective and Objective Criteria 2023 SIGMOD 4.3514328e-05
9,896 Towards Interpretable and Learnable Risk Analysis for Entity Resolution 2020 SIGMOD 4.2600049e-05
9,963 Parallel Rule Discovery from Large Datasets by Sampling 2022 SIGMOD 4.2294678e-05
Previous Page 1 / 1 Next

Semantically Similar Papers