2

I have a schema for which I would like to generate fake data. I can generate fake data using Python for each of the tables separately, but I don't know how to make sure the fake data generation adheres to the constraints I have set between the tables.

I would like to generate 1 million rows for each of the tables. My system is Ubuntu 18.04 LTS.

I came across Red Gate's SQL generator, but it is only for SQL Server AND available only in Windows. SQL Server is currently not available for Ubuntu 18.04 either. I currently have PostgreSQL and MySQL in my system.

http://www.generatedata.com I can install locally (didn't try it yet) but I still can't figure out how to make sure the data generation adheres to PK-FK constraints.

MDCCL
  • 8,520
  • 3
  • 30
  • 61

1 Answers1

1

Basically, chain data-modifying CTEs in PostgreSQL for the purpose. Details very much depend on the undisclosed details of your setup and specific requirements. The meaning of "random" varies wildly.

Minimal demo:

CREATE TABLE fk_tbl(fk_id serial PRIMARY KEY, data int);
CREATE TABLE pk_tbl(pk_id serial PRIMARY KEY, fk_id int REFERENCES fk_tbl, data int);

WITH ins1 AS ( INSERT INTO fk_tbl(data) SELECT trunc(random() * 10000)::int -- or whatever FROM generate_series(1, 10) g -- or 1000000 instead of 10? RETURNING fk_id ) INSERT INTO pk_tbl(fk_id, data) SELECT fk_id, trunc(random() * 55555)::int -- or whatever FROM ins1 ORDER BY random(); -- optional

db<>fiddle here

Related:

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600
  • Hi thank you for he answer. This is basically a personal project of mine for which I would like to generate fake data. I have come up with the schema. I was thinking of generating data according to the schema. But I am now reevaluating whether I should generate the data first, then split it into NFs etc into the schema. – scientific_explorer Jul 31 '19 at 02:39