Scalable Discovery of Unique Column Combinations
Summary: Ducc scales discovery of all unique and non-unique column combinations by framing it as a graph-coloring problem and applying a hybrid column-based DFS/random-walk pruning. Row-based pruning with scale-out deployment yields up to 631× faster than Gordian and 398× faster than HCA on multi-million-row datasets. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
Incoming Citations (Sorted by Pagerank)
Showing 17 of 17 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 9 of 9 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 86 | The End of an Architectural Era (It's Time for a Complete Rewrite) | 2007 | VLDB | 0.00052563276 |
| 224 | CORDS: Automatic Discovery of Correlations and Soft Functional Dependencies | 2004 | SIGMOD | 0.00032746205 |
| 560 | Dependencies Revisited for Improving Data Quality | 2008 | PODS | 0.00020141923 |
| 1,188 | On Generating Near-Optimal Tableaux for Conditional Functional Dependencies | 2008 | VLDB | 0.00013441729 |
| 1,401 | Extending Dependencies with Conditions | 2007 | VLDB | 0.00012187775 |
| 1,664 | On Multi-Column Foreign Key Discovery | 2010 | VLDB | 0.00010976887 |
| 1,974 | BHUNT: Automatic Discovery of Fuzzy Algebraic Constraints in Relational Data | 2003 | VLDB | 9.8866171e-05 |
| 2,266 | Estimating the Confidence of Conditional Functional Dependencies | 2009 | SIGMOD | 9.1540815e-05 |
| 2,549 | GORDIAN: Efficient and Scalable Discovery of Composite Keys | 2006 | VLDB | 8.5641554e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 4,619 | Crowd-Based Deduplication: An Adaptive Approach | 2015 | SIGMOD | 6.0444854e-05 |
| 10,372 | Data Chunk Compaction in Vectorized Execution | 2025 | SIGMOD | 4.1945683e-05 |
| 3,360 | Modeling and Querying Possible Repairs in Duplicate Detection | 2009 | VLDB | 7.1742067e-05 |
| 6,690 | Parallel Discrepancy Detection and Incremental Detection | 2021 | VLDB | 4.9621556e-05 |
| 7,588 | Scalable Column Concept Determination for Web Tables Using Large Knowledge Bases | 2013 | VLDB | 4.7030914e-05 |
| 936 | Framework for Evaluating Clustering Algorithms in Duplicate Detection | 2009 | VLDB | 0.0001521549 |
| 3,528 | Distributed Data Deduplication | 2016 | VLDB | 7.0066139e-05 |
| 5,053 | DunceCap: Query Plans Using Generalized Hypertree Decompositions | 2015 | SIGMOD | 5.7323846e-05 |
| 8,850 | Hitting Set Enumeration with Partial Information for Unique Column Combination Discovery | 2020 | VLDB | 4.4364648e-05 |
| 8,085 | Discovery and Ranking of Embedded Uniqueness Constraints | 2019 | VLDB | 4.5902231e-05 |