Back to papers
Data Profiling – A Tutorial
Summary: Tutorial on data profiling: metadata discovery, profiling tasks, and a survey of relational profiling systems. Addresses hard problems: dependency discovery, dynamic/streaming data, and visualization/interpretation; outlines future research directions.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 5330
- Venue
- SIGMOD
- Year
- 2017
- Pagerank
- 7.069081e-05
- Overall Rank
- 3,467 | 75.89%
- DOI
-
10.1145/3035918.3054772
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 12 of 12 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 6,756 |
Fast Incremental Discovery of Pointwise Order Dependencies |
2020 |
VLDB |
4.9379361e-05 |
| 8,472 |
Rapidash: Efficient Detection of Constraint Violations |
2024 |
VLDB |
4.5036378e-05 |
| 8,475 |
DataProf: Semantic Profiling for Iterative Data Cleansing and Business Rule Acquisition |
2018 |
SIGMOD |
4.5028904e-05 |
| 8,743 |
CtxPipe: Context-aware Data Preparation Pipeline Construction for Machine Learning |
2024 |
SIGMOD |
4.456315e-05 |
| 8,836 |
Fast Approximate Denial Constraint Discovery |
2023 |
VLDB |
4.4393184e-05 |
| 9,151 |
The Power of Constraints in Natural Language to SQL Translation |
2025 |
VLDB |
4.3849295e-05 |
| 9,379 |
GIO: Generating Efficient Matrix and Frame Readers for Custom Data Formats by Example |
2023 |
SIGMOD |
4.3462787e-05 |
| 9,749 |
Efficient Differential Dependency Discovery |
2024 |
VLDB |
4.2897489e-05 |
| 10,540 |
Discovering Approximate Inclusion Dependencies |
2025 |
VLDB |
4.1945683e-05 |
| 10,587 |
Efficient Discovery of Relaxed Functional Dependencies |
2025 |
VLDB |
4.1945683e-05 |
| 10,628 |
CatDB: Data-catalog-guided, LLM-based Generation of Data-centric ML Pipelines |
2025 |
VLDB |
4.1945683e-05 |
| 11,462 |
INCA: Inconsistency-Aware Data Profiling and Querying |
2021 |
SIGMOD |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 19 of 19 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 112 |
Potter's Wheel: An Interactive Data Cleaning System |
2001 |
VLDB |
0.00047045036 |
| 140 |
The MADlib Analytics Library or MAD Skills, the SQL |
2012 |
VLDB |
0.00042270404 |
| 224 |
CORDS: Automatic Discovery of Correlations and Soft Functional Dependencies |
2004 |
SIGMOD |
0.00032746205 |
| 475 |
Mining Database Structure; Or, How to Build a Data Quality Browser |
2002 |
SIGMOD |
0.00022303253 |
| 555 |
Discovering Denial Constraints |
2013 |
VLDB |
0.00020254908 |
| 610 |
Goods: Organizing Google's Datasets |
2016 |
SIGMOD |
0.00019232674 |
| 894 |
A Hybrid Approach to Functional Dependency Discovery |
2016 |
SIGMOD |
0.00015556428 |
| 1,277 |
The Data Civilizer System |
2017 |
CIDR |
0.00012879695 |
| 1,401 |
Extending Dependencies with Conditions |
2007 |
VLDB |
0.00012187775 |
| 1,625 |
Data Profiling with Metanome |
2015 |
VLDB |
0.00011094926 |
| 1,908 |
Information-Theoretic Tools for Mining Database Structure from Large Data Sets |
2004 |
SIGMOD |
0.00010126101 |
| 2,159 |
Sequential Dependencies |
2009 |
VLDB |
9.4130956e-05 |
| 4,682 |
Scalable Discovery of Unique Column Combinations |
2014 |
VLDB |
6.0022412e-05 |
| 4,744 |
Effective and Complete Discovery of Order Dependencies via Set-based Axiomatization |
2017 |
VLDB |
5.957936e-05 |
| 4,784 |
Divide & Conquer-based Inclusion Dependency Discovery |
2015 |
VLDB |
5.9240851e-05 |
| 4,904 |
Temporal Rules Discovery for Web Data Cleaning |
2016 |
VLDB |
5.8399195e-05 |
| 4,929 |
Data Auditor: Exploring Data Quality and Semantics using Pattern Tableaux |
2010 |
VLDB |
5.8217296e-05 |
| 5,577 |
Challenges and Opportunities with Big Data |
2012 |
VLDB |
5.4259878e-05 |
| 6,437 |
Fundamentals of Order Dependencies |
2012 |
VLDB |
5.0631488e-05 |
Semantically Similar Papers