Back to papers
Responsible Data Integration: Next-generation Challenges
Summary: Tutorial on responsible data integration, linking data quality and bias to auditing in ML pipelines. Focus: (1) auditing quality and bias; (2) tasks raising responsibility measures; (3) techniques and open problems for responsible data science.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 6407
- Venue
- SIGMOD
- Year
- 2022
- Pagerank
- 5.245976e-05
- Overall Rank
- 5,976 | 58.43%
- DOI
-
10.1145/3514221.3522567
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 10 of 10 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 6,077 |
The Fast and the Private: Task-based Dataset Search |
2024 |
CIDR |
5.2229324e-05 |
| 7,491 |
Saibot: A Differentially Private Data Search Platform |
2023 |
VLDB |
4.7180617e-05 |
| 8,281 |
Optimizing Data Acquisition to Enhance Machine Learning Performance |
2024 |
VLDB |
4.5435639e-05 |
| 9,644 |
Fair and Actionable Causal Prescription Ruleset |
2025 |
SIGMOD |
4.3109001e-05 |
| 9,712 |
Maximizing Fair Content Spread via Edge Suggestion in Social Networks |
2022 |
VLDB |
4.299267e-05 |
| 10,223 |
On Fair Epsilon Net and Geometric Hitting Set |
2026 |
VLDB |
4.1945683e-05 |
| 10,524 |
Understanding the Black Box: A Deep Empirical Dive into Shapley Value Approximations for Tabular Data |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,955 |
Data Acquisition for Improving Model Confidence |
2024 |
SIGMOD |
4.1945683e-05 |
| 10,960 |
FairHash: A Fair and Memory/Time-efficient Hashmap |
2024 |
SIGMOD |
4.1945683e-05 |
| 11,068 |
Chameleon: Foundation Models for Fairness-aware Multi-modal Data Augmentation to Enhance Coverage of Minorities |
2024 |
VLDB |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 27 of 27 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 18 |
On Random Sampling over Joins |
1999 |
SIGMOD |
0.00092385438 |
| 211 |
Join Synopses for Approximate Query Answering |
1999 |
SIGMOD |
0.00033981214 |
| 943 |
Wander Join: Online Aggregation via Random Walks |
2016 |
SIGMOD |
0.00015145883 |
| 1,041 |
Interventional Fairness : Causal Database Repair for Algorithmic Fairness |
2019 |
SIGMOD |
0.00014482047 |
| 1,178 |
Table Union Search on Open Data |
2018 |
VLDB |
0.00013468118 |
| 1,187 |
JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes |
2019 |
SIGMOD |
0.00013443639 |
| 1,369 |
Random Sampling over Joins Revisited |
2018 |
SIGMOD |
0.00012339777 |
| 1,751 |
Auctus: A Dataset Search Engine for Data Discovery and Augmentation |
2021 |
VLDB |
0.00010683295 |
| 2,141 |
LSH Ensemble: Internet-Scale Domain Search |
2016 |
VLDB |
9.4542625e-05 |
| 2,202 |
A Scalable Hash Ripple Join Algorithm |
2002 |
SIGMOD |
9.2987417e-05 |
| 2,259 |
MithraCoverage: A System for Investigating Population Bias for Intersectional Fairness |
2020 |
SIGMOD |
9.167331e-05 |
| 3,358 |
Organizing Data Lakes for Navigation |
2020 |
SIGMOD |
7.1784949e-05 |
| 3,750 |
Data Acquisition for Improving Machine Learning Models |
2021 |
VLDB |
6.7895763e-05 |
| 3,824 |
Correlation Sketches for Approximate Join-Correlation Queries |
2021 |
SIGMOD |
6.7260705e-05 |
| 4,375 |
Sample Debiasing in the Themis Open World Database System |
2020 |
SIGMOD |
6.2427076e-05 |
| 4,426 |
Data Debugging and Exploration with Vizier |
2019 |
SIGMOD |
6.1969994e-05 |
| 4,749 |
Slice Tuner: A Selective Data Acquisition Framework for Accurate and Fair Machine Learning Models |
2021 |
SIGMOD |
5.9503689e-05 |
| 5,362 |
Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach |
2016 |
SIGMOD |
5.5473503e-05 |
| 5,963 |
Automatic Data Acquisition for Deep Learning |
2021 |
VLDB |
5.2526794e-05 |
| 6,438 |
RONIN: Data Lake Exploration |
2021 |
VLDB |
5.0620163e-05 |
| 6,467 |
Tailoring Data Source Distributions for Fairness-aware Data Integration |
2021 |
VLDB |
5.0528156e-05 |
| 6,526 |
Data Collection and Quality Challenges for Deep Learning |
2020 |
VLDB |
5.0267429e-05 |
| 6,892 |
Identifying Insufficient Data Coverage for Ordinal Continuous-Valued Attributes |
2021 |
SIGMOD |
4.8925683e-05 |
| 7,685 |
Fairly Evaluating and Scoring Items in a Data Set |
2020 |
VLDB |
4.6788921e-05 |
| 7,714 |
Identifying Insufficient Data Coverage in Databases with Multiple Relations |
2020 |
VLDB |
4.6700455e-05 |
| 8,346 |
Deep Learning: Systems and Responsibility |
2021 |
SIGMOD |
4.5420668e-05 |
| 9,177 |
Cost-efficient Data Acquisition on Online Data Marketplaces for Correlation Analysis |
2019 |
VLDB |
4.3834281e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 4,526 |
Responsible Data Science |
2019 |
SIGMOD |
6.1092845e-05 |
| 9,777 |
Data Augmentation for ML-driven Data Preparation and Integration |
2021 |
VLDB |
4.2856106e-05 |
| 507 |
Data Quality and Data Cleaning: An Overview |
2003 |
SIGMOD |
0.00021473263 |
| 1,420 |
Data Management Challenges in Production Machine Learning |
2017 |
SIGMOD |
0.00012057956 |
| 1,532 |
Data Management in Machine Learning: Challenges, Techniques, and Systems |
2017 |
SIGMOD |
0.00011472681 |
| 6,526 |
Data Collection and Quality Challenges for Deep Learning |
2020 |
VLDB |
5.0267429e-05 |
| 1,404 |
Responsible Data Management |
2020 |
VLDB |
0.00012174977 |
| 13,292 |
The Responsibility Challenge for Data |
2019 |
SIGMOD |
- |
| 4,607 |
Data Integration and Machine Learning: A Natural Synergy |
2018 |
SIGMOD |
6.0538827e-05 |
| 7,243 |
Data Integration and Machine Learning: A Natural Synergy |
2018 |
VLDB |
4.7913666e-05 |