COUNTATA: Dataset Labeling Using Pattern Counts
Summary: Countata introduces a label-based compact summary of attribute-pattern counts to estimate multi-attribute frequencies without enumerating all combinations. It defines an estimation function mapping labels to counts, analyzes the label-size and error trade-off, and demonstrates the prototype on real data. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Authors
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 1 of 1 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 7,191 | A Nutritional Label for Rankings | 2018 | SIGMOD | 4.8049534e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 10,288 | TATA: An Efficient Framework for Task Transfer in Query Plan Representation | 2026 | VLDB | 4.1945683e-05 |
| 9,388 | CEDA: Learned Cardinality Estimation with Domain Adaptation | 2023 | VLDB | 4.3443083e-05 |
| 7,251 | Learning to Sample: Counting with Complex Queries | 2020 | VLDB | 4.7890519e-05 |
| 6,805 | Ratio Rules: A New Paradigm for Fast, Quantifiable Data Mining | 1998 | VLDB | 4.9222308e-05 |
| 3,702 | Every Row Counts: Combining Sketches and Sampling for Accurate Group-By Result Estimates | 2019 | CIDR | 6.8295759e-05 |
| 2,334 | Counting with the Crowd | 2013 | VLDB | 9.0161817e-05 |
| 5,729 | KATARA: Reliable Data Cleaning with Knowledge Bases and Crowdsourcing | 2015 | VLDB | 5.3506368e-05 |
| 11,304 | Bayesian Sketches for Volume Estimation in Data Streams | 2023 | VLDB | 4.1945683e-05 |
| 39 | Statistical Estimators for Relational Algebra Expressions | 1988 | PODS | 0.00074745564 |
| 6,244 | Approximate Distinct Counts for Billions of Datasets | 2019 | SIGMOD | 5.139669e-05 |