Database Paper Browser

Back to papers

BigDansing: A System for Big Data Cleansing

Summary: BigDansing: scalable big-data cleansing. It lets users express rules declaratively or procedurally and compiles them into distributed transforms with shared scans and specialized joins atop DBMS/MapReduce, delivering up to 100x speedups while preserving repair quality. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
5041
Venue
SIGMOD
Year
2015
Pagerank
7.8372441e-05
Overall Rank
2,946 | 79.51%
DOI
10.1145/2723372.2747646

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 34 of 34 citing papers.

Rank Citing Paper Year Venue Pagerank
1,277 The Data Civilizer System 2017 CIDR 0.00012879695
1,337 HoloDetect: Few-Shot Learning for Error Detection 2019 SIGMOD 0.00012497164
1,612 Detecting Data Errors: Where are we and what needs to be done? 2016 VLDB 0.00011142794
1,627 Data Cleaning: Overview and Emerging Challenges 2016 SIGMOD 0.00011086905
2,077 Efficient Discovery of Approximate Dependencies 2018 VLDB 9.6001836e-05
2,175 Falcon: Scaling Up Hands-Off Crowdsourced Entity Matching to Build Cloud Services 2017 SIGMOD 9.3644117e-05
2,450 Functional Dependencies for Graphs 2016 SIGMOD 8.7882979e-05
3,265 RHEEM: Enabling Cross-Platform Data Processing - May The Big Data Be With You! - 2018 VLDB 7.3083672e-05
3,396 Automatic Data Repair: Are We Ready to Deploy? 2024 VLDB 7.1455126e-05
3,571 Lightning Fast and Space Efficient Inequality Joins 2015 VLDB 6.9580858e-05
4,273 Cleaning Denial Constraint Violations through Relaxation 2020 SIGMOD 6.3003864e-05
4,904 Temporal Rules Discovery for Web Data Cleaning 2016 VLDB 5.8399195e-05
5,205 ANMAT: Automatic Knowledge Discovery and Error Detection through Pattern Functional Dependencies 2019 SIGMOD 5.630869e-05
5,618 Explaining Repaired Data with CFDs 2018 VLDB 5.4079415e-05
5,729 KATARA: Reliable Data Cleaning with Knowledge Bases and Crowdsourcing 2015 VLDB 5.3506368e-05
6,690 Parallel Discrepancy Detection and Incremental Detection 2021 VLDB 4.9621556e-05
7,013 Qualitative Data Cleaning 2016 VLDB 4.8619024e-05
7,237 CleanM: An Optimizable Query Language for Unified Scale-Out Data Cleaning 2017 VLDB 4.7928651e-05
8,092 Saga: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications 2023 SIGMOD 4.587921e-05
8,422 Deducing Certain Fixes to Graphs 2019 VLDB 4.5167705e-05
8,745 Sparcle: Boosting the Accuracy of Data Cleaning Systems through Spatial Awareness 2024 VLDB 4.456315e-05
8,836 Fast Approximate Denial Constraint Discovery 2023 VLDB 4.4393184e-05
9,001 The Power of Nested Parallelism in Big Data Processing – Hitting Three Flies with One Slap – 2021 SIGMOD 4.4107627e-05
9,077 VerifAI: Verified Generative AI 2024 CIDR 4.4010762e-05
9,240 ZIP: Lazy Imputation during Query Processing 2024 VLDB 4.3690661e-05
9,278 Interactive and Deterministic Data Cleaning: A Tossed Stone Raises a Thousand Ripples 2016 SIGMOD 4.3639892e-05
9,434 Rock: Cleaning Data by Embedding ML in Logic Rules 2024 SIGMOD 4.3430376e-05
9,810 Rheem: Enabling Multi-Platform Task Execution 2016 SIGMOD 4.278405e-05
10,026 Minimum Change ≠ Best Cleaning: Parallel and Incremental Error Detection under Integrity Constraints 2026 SIGMOD 4.1945683e-05
10,512 Auto-Test: Learning Semantic-Domain Constraints for Unsupervised Error Detection in Tables 2025 SIGMOD 4.1945683e-05
10,723 UniClean: A Scalable Data Cleaning Solution for Mixed Errors based on Unified Cleaners and Optimized Cleaning Workflow 2025 VLDB 4.1945683e-05
11,178 LinCQA: Faster Consistent Query Answering with Linear Time Guarantees 2023 SIGMOD 4.1945683e-05
11,369 PGE: Robust Product Graph Embedding Learning for Error Detection 2022 VLDB 4.1945683e-05
11,682 IHCS: An Integrated Hybrid Cleaning System 2019 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 19 of 19 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
3 Pig Latin: A Not-So-Foreign Language for Data Processing 2008 SIGMOD 0.0024183614
4 Pregel: A System for Large-Scale Graph Processing 2010 SIGMOD 0.0019005923
70 Hive - A Warehousing Solution Over a Map-Reduce Framework 2009 VLDB 0.00059533166
152 An Evaluation of Non-Equijoin Algorithms 1991 VLDB 0.00040963225
265 A Cost-Based Model and Effective Heuristic for Repairing Constraints by Value Modification 2005 SIGMOD 0.00029763412
542 Shark: SQL and Rich Analytics at Scale 2013 SIGMOD 0.00020595648
656 ERACER: A Database Approach for Statistical Inference and Data Cleaning 2010 SIGMOD 0.00018588729
833 Guided Data Repair 2011 VLDB 0.00016138432
881 Don’t be SCAREd: Use SCalable Automatic REpairing with Maximal Likelihood and Bounded Changes 2013 SIGMOD 0.00015661103
1,012 NADEEF: A Commodity Data Cleaning System 2013 SIGMOD 0.0001464733
1,074 Processing Theta-Joins using MapReduce* 2011 SIGMOD 0.00014260096
1,197 The LLUNATIC Data-Cleaning Framework 2013 VLDB 0.00013390321
1,280 Automatic Optimization for MapReduce Programs 2011 VLDB 0.0001285503
1,624 Sampling the Repairs of Functional Dependency Violations under Hard Constraints 2010 VLDB 0.00011099222
2,184 A Sample-and-Clean Framework for Fast and Accurate Query Processing on Dirty Data 2014 SIGMOD 9.3429789e-05
2,231 Dedoop: Efficient Deduplication with Hadoop 2012 VLDB 9.2304499e-05
2,823 Interaction between Record Matching and Data Repairing 2011 SIGMOD 8.0593894e-05
3,192 Towards Dependable Data Repairing with Fixing Rules 2014 SIGMOD 7.4095761e-05
7,958 CARTILAGE: Adding Flexibility to the Hadoop Skeleton 2013 SIGMOD 4.613363e-05
Previous Page 1 / 1 Next

Semantically Similar Papers