A Grammar-based Entity Representation Framework for Data Cleaning
Summary: Grammar-based entity representation framework for data cleaning; fusion of generative grammar, database querying, and compiler-like actions to manipulate representations. Empirical study on real data shows proper normalization often minimizes further cleansing. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Arvind Arasu
- 2. Raghav Kaushik
Incoming Citations (Sorted by Pagerank)
Showing 3 of 3 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,181 | Active Knowledge: Dynamically Enriching RDF Knowledge Bases by Web Services | 2010 | SIGMOD | 5.6410659e-05 |
| 11,178 | LinCQA: Faster Consistent Query Answering with Linear Time Guarantees | 2023 | SIGMOD | 4.1945683e-05 |
| 11,216 | Demystifying the QoS and QoE of Edge-hosted Video Streaming Applications in the Wild with SNESet | 2023 | SIGMOD | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 6 of 6 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 155 | Robust and Efficient Fuzzy Match for Online Data Cleaning | 2003 | SIGMOD | 0.00040637896 |
| 322 | Record Linkage: Similarity Measures and Algorithms | 2006 | SIGMOD | 0.00027518768 |
| 637 | Automatic segmentation of text into structured records | 2001 | SIGMOD | 0.00018824614 |
| 1,147 | Web-scale Data Integration: You can only afford to Pay As You Go | 2007 | CIDR | 0.00013677658 |
| 3,267 | Benchmarking Declarative Approximate Selection Predicates | 2007 | SIGMOD | 7.3058429e-05 |
| 3,868 | An Efficient Filter for Approximate Membership Checking | 2008 | SIGMOD | 6.6822543e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,065 | Data-Driven Understanding and Refinement of Schema Mappings | 2001 | SIGMOD | 0.00014338146 |
| 1,627 | Data Cleaning: Overview and Emerging Challenges | 2016 | SIGMOD | 0.00011086905 |
| 6,846 | A framework for annotating CSV-like data | 2016 | VLDB | 4.9092462e-05 |
| 5,586 | QuERy: A Framework for Integrating Entity Resolution with Query Processing | 2016 | VLDB | 5.4219548e-05 |
| 2,460 | Combining Quantitative and Logical Data Cleaning | 2016 | VLDB | 8.7617484e-05 |
| 7,237 | CleanM: An Optimizable Query Language for Unified Scale-Out Data Cleaning | 2017 | VLDB | 4.7928651e-05 |
| 5,660 | Descriptive and Prescriptive Data Cleaning | 2014 | SIGMOD | 5.3847321e-05 |
| 6,175 | Query-Driven Approach to Entity Resolution | 2013 | VLDB | 5.169496e-05 |
| 5,398 | Cleaning Inconsistencies in Information Extraction via Prioritized Repairs | 2014 | PODS | 5.5295577e-05 |
| 199 | Declarative Data Cleaning: Language, Model, and Algorithms | 2001 | VLDB | 0.00035041015 |