DataSynth: Generating Synthetic Data using Declarative Constraints
Summary: DataSynth uses a declarative, cardinality-constraint abstraction to specify complex synthetic data characteristics. Efficient generation algorithms enable realistic DB instances for testing, masking, and benchmarking; demo on two real-world scenarios. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Arvind Arasu
- 2. Raghav Kaushik
- 3. Jian Li
Incoming Citations (Sorted by Pagerank)
Showing 1 of 1 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 9,836 | Projection-Compliant Database Generation | 2022 | VLDB | 4.2747054e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 6 of 6 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 145 | Quickly Generating Billion-Record Synthetic Databases | 1994 | SIGMOD | 0.0004138408 |
| 512 | STHoles: A Multidimensional Workload-Aware Histogram | 2001 | SIGMOD | 0.00021380733 |
| 888 | QAGen: Generating Query-Aware Test Databases | 2007 | SIGMOD | 0.00015578618 |
| 934 | Flexible Database Generators | 2005 | VLDB | 0.00015227409 |
| 2,291 | Data Generation using Declarative Constraints | 2011 | SIGMOD | 9.0926719e-05 |
| 4,517 | Generating Databases for Query Workloads | 2010 | VLDB | 6.1178732e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 7,895 | HYDRA: A Dynamic Big Data Regenerator | 2018 | VLDB | 4.623701e-05 |
| 3,831 | Kamino: Constraint-Aware Differentially Private Data Synthesis | 2021 | VLDB | 6.7181688e-05 |
| 8,625 | Generating Flexible Workloads for Graph Databases | 2016 | VLDB | 4.4830029e-05 |
| 6,887 | Synthesizing Linked Data Under Cardinality and Integrity Constraints | 2021 | SIGMOD | 4.8937852e-05 |
| 8,699 | Supporting Database Constraints in Synthetic Data Generation based on Generative Adversarial Networks | 2020 | SIGMOD | 4.465684e-05 |
| 7,502 | PSynDB: Accurate and Accessible Private Data Generation | 2019 | VLDB | 4.7180617e-05 |
| 13,274 | Synner: Generating Realistic Synthetic Data | 2020 | SIGMOD | - |
| 934 | Flexible Database Generators | 2005 | VLDB | 0.00015227409 |
| 6,234 | Just can't get enough - Synthesizing Big Data | 2015 | SIGMOD | 5.1451686e-05 |
| 2,291 | Data Generation using Declarative Constraints | 2011 | SIGMOD | 9.0926719e-05 |