Data Civilizer 2.0: A Holistic Framework for Data Preparation and Analytics
Summary: Data Civilizer 2.0 is an end-to-end workflow unifying data cleaning and ML development for integrated preparation and analytics. Features a data debugger and workflow visualization; demonstrates end-to-end cleaning and ML on a 30TB brain dataset at MGH. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. El Kindi Rezig
- 2. Lei Cao
- 3. Michael Stonebraker
- 4. Giovanni Simonini
- 5. Wenbo Tao
- 6. Samuel Madden
- 7. Mourad Ouzzani
- 8. Nan Tang
- 9. Ahmed K. Elmagarmid
Incoming Citations (Sorted by Pagerank)
Showing 5 of 5 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,541 | Symphony: Towards Natural Language Query Answering over Multi-modal Data Lakes | 2023 | CIDR | 0.00011456579 |
| 4,935 | OmniFair: A Declarative System for Model-Agnostic Group Fairness in Machine Learning | 2021 | SIGMOD | 5.8198727e-05 |
| 5,684 | Dagger: A Data (not code) Debugger | 2020 | CIDR | 5.3720749e-05 |
| 9,306 | Debugging Large-Scale Data Science Pipelines using Dagger | 2020 | VLDB | 4.3572942e-05 |
| 10,610 | Weak-to-Strong Prompts with Lightweight-to-Powerful LLMs for High-Accuracy, Low-Cost, and Explainable Data Transformation | 2025 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 6 of 6 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 192 | HoloClean: Holistic Data Repairs with Probabilistic Inference | 2017 | VLDB | 0.00035728858 |
| 712 | Magellan: Toward Building Entity Matching Management Systems | 2016 | VLDB | 0.00017732426 |
| 1,277 | The Data Civilizer System | 2017 | CIDR | 0.00012879695 |
| 3,023 | Helix: Accelerating Human-in-the-loop Machine Learning | 2018 | VLDB | 7.6929986e-05 |
| 5,058 | A Demo of the Data Civilizer System | 2017 | SIGMOD | 5.7280139e-05 |
| 5,370 | Kyrix: Interactive Visual Data Exploration at Scale | 2019 | CIDR | 5.5432976e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 7,138 | Ease.ml/ci and Ease.ml/meter in Action: Towards Data Management for Statistical Generalization | 2019 | VLDB | 4.8216981e-05 |
| 6,526 | Data Collection and Quality Challenges for Deep Learning | 2020 | VLDB | 5.0267429e-05 |
| 4,426 | Data Debugging and Exploration with Vizier | 2019 | SIGMOD | 6.1969994e-05 |
| 8,092 | Saga: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications | 2023 | SIGMOD | 4.587921e-05 |
| 5,929 | ActiveClean: An Interactive Data Cleaning Framework For Modern Machine Learning | 2016 | SIGMOD | 5.2682177e-05 |
| 5,684 | Dagger: A Data (not code) Debugger | 2020 | CIDR | 5.3720749e-05 |
| 11,515 | From Papers to Practice: The openclean Open-Source Data Cleaning Library | 2021 | VLDB | 4.1945683e-05 |
| 13,232 | Data Cleaning in the Era of Data Science: Challenges and Opportunities | 2021 | CIDR | - |
| 1,277 | The Data Civilizer System | 2017 | CIDR | 0.00012879695 |
| 5,058 | A Demo of the Data Civilizer System | 2017 | SIGMOD | 5.7280139e-05 |