CLAMS: Bringing Quality to Data Lakes
Summary: CLAMS discovers and enforces expressive integrity constraints from lake data with limited schema, inferring schemas and constraints from RDF-like triples. It scales out, supports human-in-the-loop validation and repair, demonstrated on a real enterprise lake with 1.2B triples. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Mina Farid
- 2. Alexandra Roatis
- 3. Ihab F. Ilyas
- 4. Hella-Franziska Hoffmann
- 5. Xu Chu
Incoming Citations (Sorted by Pagerank)
Showing 7 of 7 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 939 | Data Lake Management: Challenges and Opportunities | 2019 | VLDB | 0.00015187344 |
| 2,836 | Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning | 2023 | VLDB | 8.0443826e-05 |
| 3,000 | SANTOS: Relationship-based Semantic Table Union Search | 2023 | SIGMOD | 7.7462128e-05 |
| 4,859 | Integrating Data Lake Tables | 2023 | VLDB | 5.8732433e-05 |
| 7,384 | The VADA Architecture for Cost-Effective Data Wrangling | 2017 | SIGMOD | 4.7445719e-05 |
| 8,729 | OneProvenance: Efficient Extraction of Dynamic Coarse-Grained Provenance From Database Query Event Logs | 2023 | VLDB | 4.4582221e-05 |
| 9,316 | READY: Completeness is in the Eye of the Beholder | 2017 | CIDR | 4.3559005e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 2 of 2 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 555 | Discovering Denial Constraints | 2013 | VLDB | 0.00020254908 |
| 5,660 | Descriptive and Prescriptive Data Cleaning | 2014 | SIGMOD | 5.3847321e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 9,278 | Interactive and Deterministic Data Cleaning: A Tossed Stone Raises a Thousand Ripples | 2016 | SIGMOD | 4.3639892e-05 |
| 10,022 | In-context Clustering-based Entity Resolution with Large Language Models: A Design Space Exploration | 2026 | SIGMOD | 4.1945683e-05 |
| 5,462 | RetClean: Retrieval-Based Data Cleaning Using LLMs and Data Lakes | 2024 | VLDB | 5.494769e-05 |
| 8,974 | DataLoom: Simplifying Data Loading with LLMs | 2024 | VLDB | 4.4184286e-05 |
| 3,281 | Constance: An Intelligent Data Lake System | 2016 | SIGMOD | 7.2823287e-05 |
| 13,277 | The Challenge of Building Effective Data Lakes | 2020 | SIGMOD | - |
| 8,917 | Data Lakes Empowered by Knowledge Graph Technologies | 2021 | SIGMOD | 4.427232e-05 |
| 2,946 | BigDansing: A System for Big Data Cleansing | 2015 | SIGMOD | 7.8372441e-05 |
| 7,237 | CleanM: An Optimizable Query Language for Unified Scale-Out Data Cleaning | 2017 | VLDB | 4.7928651e-05 |
| 1,482 | Automating Large-Scale Data Quality Verification | 2018 | VLDB | 0.00011725533 |