Database Paper Browser

Back to papers

Reptile: Aggregation-level Explanations for Hierarchical Data

Summary: Iterative, human-in-the-loop system that explains and cleans hierarchical data by learning group-level statistics and guiding drill-downs to fix distributive aggregation errors. Introduces factorised learning for aggregation-join queries with hierarchical optimisations, delivering >6× speedups and real-world deployments on Covid-19 and African farmer surveys used for policy-relevant data cleaning. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
6306
Venue
SIGMOD
Year
2022
Pagerank
4.2721228e-05
Overall Rank
9,849 | 31.49%
DOI
10.1145/3514221.3517854

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 1 of 1 citing papers.

Rank Citing Paper Year Venue Pagerank
10,875 SDEcho: Efficient Explanation of Aggregated Sequence Difference 2025 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 27 of 27 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
192 HoloClean: Holistic Data Repairs with Probabilistic Inference 2017 VLDB 0.00035728858
214 Scorpion: Explaining Away Outliers in Aggregate Queries 2013 VLDB 0.0003363692
555 Discovering Denial Constraints 2013 VLDB 0.00020254908
656 ERACER: A Database Approach for Statistical Inference and Data Cleaning 2010 SIGMOD 0.00018588729
767 Explaining differences in multidimensional aggregates 1999 VLDB 0.00016981309
833 Guided Data Repair 2011 VLDB 0.00016138432
834 Learning Linear Regression Models over Factorized Joins 2016 SIGMOD 0.00016135159
881 Don’t be SCAREd: Use SCalable Automatic REpairing with Maximal Likelihood and Bounded Changes 2013 SIGMOD 0.00015661103
942 A Formal Approach to Finding Explanations for Database Queries 2014 SIGMOD 0.00015155714
1,337 HoloDetect: Few-Shot Learning for Error Detection 2019 SIGMOD 0.00012497164
1,463 ARDA: Automatic Relational Data Augmentation for Machine Learning 2020 VLDB 0.00011869295
1,627 Data Cleaning: Overview and Emerging Challenges 2016 SIGMOD 0.00011086905
1,894 Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning 2020 VLDB 0.0001018378
2,126 MacroBase: Prioritizing Attention in Fast Data 2017 SIGMOD 9.4887794e-05
2,154 DIFF: A Relational Interface for Large-Scale Data Explanation 2019 VLDB 9.4208667e-05
2,968 Raha: A Configuration-Free Error Detection System 2019 SIGMOD 7.7985097e-05
3,976 UGuide – User-Guided Discovery of FD-Detectable Errors 2017 SIGMOD 6.5736462e-05
4,197 Incremental View Maintenance with Triple Lock Factorization Benefits 2018 SIGMOD 6.367895e-05
4,273 Cleaning Denial Constraint Violations through Relaxation 2020 SIGMOD 6.3003864e-05
4,693 Multi-Structural Databases 2005 PODS 5.9955924e-05
5,191 Going Beyond Provenance: Explaining Query Answers with Pattern-based Counterbalances 2019 SIGMOD 5.6378768e-05
5,660 Descriptive and Prescriptive Data Cleaning 2014 SIGMOD 5.3847321e-05
5,729 KATARA: Reliable Data Cleaning with Knowledge Bases and Crowdsourcing 2015 VLDB 5.3506368e-05
5,955 LMFAO: An Engine for Batches of Group-By Aggregates 2020 VLDB 5.2572882e-05
6,941 Estimating the Impact of Unknown Unknowns on Aggregate Query Results 2016 SIGMOD 4.8924e-05
7,071 Smart Drill-Down: A New Data Exploration Operator 2015 VLDB 4.8429461e-05
8,104 The Cascading Analysts Algorithm 2018 SIGMOD 4.5851358e-05
Previous Page 1 / 1 Next

Semantically Similar Papers