Wisteria: Nurturing Scalable Data Cleaning Infrastructure
Summary: Wisteria separates logical cleaning ops from their physical implementations, enabling iterative workflows from samples to full data. Crowdsourcing, in-flight operator replacement, and sampling-driven optimization guide replacements under analyst feedback. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Daniel Haas
- 2. Sanjay Krishnan
- 3. Jiannan Wang
- 4. Michael J. Franklin
- 5. Eugene Wu
Incoming Citations (Sorted by Pagerank)
Showing 8 of 8 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,627 | Data Cleaning: Overview and Emerging Challenges | 2016 | SIGMOD | 0.00011086905 |
| 2,175 | Falcon: Scaling Up Hands-Off Crowdsourced Entity Matching to Build Cloud Services | 2017 | SIGMOD | 9.3644117e-05 |
| 2,280 | SMOKE: Fine-grained Lineage at Interactive Speed | 2018 | VLDB | 9.1111033e-05 |
| 4,451 | CLAMShell: Speeding up Crowds for Low-latency Data Labeling | 2016 | VLDB | 6.1738675e-05 |
| 4,668 | PrivateClean: Data Cleaning and Differential Privacy | 2016 | SIGMOD | 6.0115918e-05 |
| 5,929 | ActiveClean: An Interactive Data Cleaning Framework For Modern Machine Learning | 2016 | SIGMOD | 5.2682177e-05 |
| 7,237 | CleanM: An Optimizable Query Language for Unified Scale-Out Data Cleaning | 2017 | VLDB | 4.7928651e-05 |
| 10,886 | FaDE: More Than a Million What-ifs Per Second | 2025 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 7 of 7 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 489 | Data Curation at Scale: The Data Tamer System | 2013 | CIDR | 0.00022030728 |
| 643 | Corleone: Hands-Off Crowdsourcing for Entity Matching | 2014 | SIGMOD | 0.00018754451 |
| 656 | ERACER: A Database Approach for Statistical Inference and Data Cleaning | 2010 | SIGMOD | 0.00018588729 |
| 1,012 | NADEEF: A Commodity Data Cleaning System | 2013 | SIGMOD | 0.0001464733 |
| 2,184 | A Sample-and-Clean Framework for Fast and Accurate Query Processing on Dirty Data | 2014 | SIGMOD | 9.3429789e-05 |
| 3,067 | CrowdFill: Collecting Structured Data from the Crowd | 2014 | SIGMOD | 7.6180371e-05 |
| 8,728 | Stale View Cleaning: Getting Fresh Answers from Stale Materialized Views | 2015 | VLDB | 4.4589711e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 199 | Declarative Data Cleaning: Language, Model, and Algorithms | 2001 | VLDB | 0.00035041015 |
| 9,500 | Arachnid: Generalized Visual Data Cleaning | 2019 | SIGMOD | 4.3341665e-05 |
| 7,384 | The VADA Architecture for Cost-Effective Data Wrangling | 2017 | SIGMOD | 4.7445719e-05 |
| 13,232 | Data Cleaning in the Era of Data Science: Challenges and Opportunities | 2021 | CIDR | - |
| 6,384 | A Demonstration of DBWipes: Clean as You Query | 2012 | VLDB | 5.0880333e-05 |
| 8,092 | Saga: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications | 2023 | SIGMOD | 4.587921e-05 |
| 4,273 | Cleaning Denial Constraint Violations through Relaxation | 2020 | SIGMOD | 6.3003864e-05 |
| 7,237 | CleanM: An Optimizable Query Language for Unified Scale-Out Data Cleaning | 2017 | VLDB | 4.7928651e-05 |
| 9,221 | VisClean: Interactive Cleaning for Progressive Visualization | 2020 | VLDB | 4.3699444e-05 |
| 11,515 | From Papers to Practice: The openclean Open-Source Data Cleaning Library | 2021 | VLDB | 4.1945683e-05 |