Just can't get enough - Synthesizing Big Data
Summary: DBSynth automatically generates realistic, large-scale synthetic data from schema and sample data, extracting value-level features and Markov models. As an extension to PDGF, it enables fast, scalable generation across formats (CSV/JSON/XML/SQL) for big-data benchmarks (TPC-DI/BigBench) with automatic dictionaries and multi-core speedups. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Tilmann Rabl
- 2. Manuel Danisch
- 3. Michael Frank
- 4. Sebastian Schindler
- 5. Hans-Arno Jacobsen
Incoming Citations (Sorted by Pagerank)
Showing 4 of 4 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 6,887 | Synthesizing Linked Data Under Cardinality and Integrity Constraints | 2021 | SIGMOD | 4.8937852e-05 |
| 7,759 | Dscaler: Synthetically Scaling A Given Relational Database | 2016 | VLDB | 4.6593145e-05 |
| 7,895 | HYDRA: A Dynamic Big Data Regenerator | 2018 | VLDB | 4.623701e-05 |
| 9,836 | Projection-Compliant Database Generation | 2022 | VLDB | 4.2747054e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 9 of 9 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 145 | Quickly Generating Billion-Record Synthetic Databases | 1994 | SIGMOD | 0.0004138408 |
| 888 | QAGen: Generating Query-Aware Test Databases | 2007 | SIGMOD | 0.00015578618 |
| 934 | Flexible Database Generators | 2005 | VLDB | 0.00015227409 |
| 1,483 | Simple and Realistic Data Generation | 2006 | VLDB | 0.00011720317 |
| 1,727 | BigBench: Towards an Industry Standard Benchmark for Big Data Analytics | 2013 | SIGMOD | 0.00010740936 |
| 2,291 | Data Generation using Declarative Constraints | 2011 | SIGMOD | 9.0926719e-05 |
| 4,517 | Generating Databases for Query Workloads | 2010 | VLDB | 6.1178732e-05 |
| 5,114 | TPC-DI: The First Industry Benchmark for Data Integration | 2014 | VLDB | 5.6863051e-05 |
| 7,217 | Myriad: Scalable and Expressive Data Generation | 2012 | VLDB | 4.7983955e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 4,884 | Relational Data Synthesis using Generative Adversarial Networks: A Design Space Exploration | 2020 | VLDB | 5.8540287e-05 |
| 8,625 | Generating Flexible Workloads for Graph Databases | 2016 | VLDB | 4.4830029e-05 |
| 8,699 | Supporting Database Constraints in Synthetic Data Generation based on Generative Adversarial Networks | 2020 | SIGMOD | 4.465684e-05 |
| 11,888 | Synthesizing Data Programs | 2015 | CIDR | 4.1945683e-05 |
| 6,456 | From Auto-tuning One Size Fits All to Self-designed and Learned Data-intensive Systems | 2019 | SIGMOD | 5.0564619e-05 |
| 934 | Flexible Database Generators | 2005 | VLDB | 0.00015227409 |
| 1,727 | BigBench: Towards an Industry Standard Benchmark for Big Data Analytics | 2013 | SIGMOD | 0.00010740936 |
| 2,291 | Data Generation using Declarative Constraints | 2011 | SIGMOD | 9.0926719e-05 |
| 145 | Quickly Generating Billion-Record Synthetic Databases | 1994 | SIGMOD | 0.0004138408 |
| 8,870 | DataSynth: Generating Synthetic Data using Declarative Constraints | 2011 | VLDB | 4.431665e-05 |