140

I am using the bash shell and would like to pipe the out of the command openssl rand -base64 1000 to the command dd such as dd if={output of openssl} of="sample.txt bs=1G count=1. I think I can use variables but I am however unsure how best to do so. The reason I would like to create the file is because I would like a 1GB file with random text.

  • What do you want to do with that file? To check e.g. compression algorithms, use the type of data they are designed for (natural language text you can get boatloads at project Gutenberg; source code, grab e.g. the GNU, BSD, Sourceforge packages or sample github). "Real world" data is not random. – vonbrand May 24 '21 at 01:24

7 Answers7

208

if= is not required, you can pipe something into dd instead:

something... | dd of=sample.txt bs=1G count=1 iflag=fullblock

something... | head -c 1G > sample.txt

It wouldn't be useful here since openssl rand requires specifying the number of bytes anyway. So you don't actually need ddthis would work:

openssl rand -out sample.txt -base64 $(( 2**30 * 3/4 ))

1 gigabyte is usually 230 bytes (though you can use 10**9 for 109 bytes instead). The * 3/4 part accounts for Base64 overhead, making the encoded output 1 GB.

Alternatively, you could use /dev/urandom, but it would be a little slower than OpenSSL:

dd if=/dev/urandom of=sample.txt bs=1G count=1 iflag=fullblock

I would use bs=64M count=16 or similar, so that 'dd' won't try to use the entire 1 GB of RAM at once:

dd if=/dev/urandom of=sample.txt bs=64M count=16 iflag=fullblock

or even the simpler head tool – you don't really need dd here:

head -c 1G /dev/urandom > sample.txt
hanshenrik
  • 1,695
  • 3
  • 21
  • 37
u1686_grawity
  • 452,512
  • Thanks. A few questions, does using the command openssl rand -base64 $(( 2**30 * 3/4 )) > sample.txt give you a true text file? Secondly I don't quite follow the use of bs=64M count=16. Can you elaborate further? – PeanutsMonkey Sep 06 '12 at 19:06
  • 2
    I posted a question regarding compressing large files at http://superuser.com/questions/467697/why-does-a-zip-file-appear-larger-than-the-source-file-especially-when-it-is-tex and was advised that using /dev/urandom generates a binary file and not a true text file. – PeanutsMonkey Sep 06 '12 at 19:10
  • @PeanutsMonkey: What do you mean by a "true text file"? A file that only contains printable characters, I'm guessing? Then yes, the -base64 option tells OpenSSL to output a "text" file. – u1686_grawity Sep 06 '12 at 19:23
  • @PeanutsMonkey: But beware that random data does not compress well, regardless of whether it is "binary" or "true text". – u1686_grawity Sep 06 '12 at 19:23
  • 2
    @PeanutsMonkey: Right; you would need something like dd if=/dev/urandom bs=750M count=1 | uuencode my_sample > sample.txt. – Scott - Слава Україні Sep 06 '12 at 19:33
  • @Scott - Can you elaborate what that does exactly as well as why you are using a byte size of 750M and a count of 1? – PeanutsMonkey Sep 06 '12 at 19:52
  • @grawity - Well people keep bouncing the term "true text file" and based on my previous post it was suggested that /dev/urandom generates binary files. My understanding is that a text file is one with printable characters although am unsure whether ASCII characters would count. I thought -base64 is used to convert binary data to text? – PeanutsMonkey Sep 06 '12 at 19:56
  • @grawity - If random data does not compress well, how can I create a file that mimics real world scenarios? – PeanutsMonkey Sep 06 '12 at 19:56
  • 4
    @PeanutsMonkey: There's no single "real world scenario", some scenarios might be dealing with gigabytes of text, others – with gigabytes of JPEGs, or gigabytes of compiled software... If you want a lot of text, download a Wikipedia dump for example. – u1686_grawity Sep 06 '12 at 20:06
  • 2
    @PeanutsMonkey: The dd reads 750,000,000 bytes from /dev/urandom and pipes them into uuencode. uuencode encodes its input into a form of base64 encoding (not necessarily consistent with other programs). In other words, this converts binary data to text. I used 750M because I trusted grawity's statement that base64 encoding expands data by 33⅓%, so you need to ask for ¾ as much binary data as you want in your text file. – Scott - Слава Україні Sep 06 '12 at 20:07
  • @Scott: Pure Base64 always encodes 3 bytes to 4 (33.(3)%). OpenSSL's encoder splits output into 64-character lines (so about 35.4% overhead; I forgot to account for this – would be *48/65). UUencode uses even shorter lines and adds length prefixes, header & footer, resulting in ~40% overhead. – u1686_grawity Sep 06 '12 at 20:15
  • @Scott - That makes sense although am curious to understand why limit the count to 1? – PeanutsMonkey Sep 06 '12 at 20:41
  • @grawity - I am astounded by the depth of knowledge. Where are you learning all of this stuff? – PeanutsMonkey Sep 06 '12 at 20:42
  • 1
    @leighmcc: FYI: using > redirection does not make the writes pass through bash – it is equivalent to having the program open the file directly. – u1686_grawity May 10 '13 at 14:03
  • 7
    Note if it says dd: warning: partial read (33554431 bytes); suggest iflag=fullblock it will create a truncated file so add the iflag=fullblock flag, then it works. – rogerdpack Sep 27 '18 at 20:21
51

Create a 1GB.bin random content file:

 dd if=/dev/urandom of=1GB.bin bs=64M count=16 iflag=fullblock
anneb
  • 1,274
7

If you just need a somewhat random file which is not used for security related things, like benchmarking something, then the following will be significantly faster:

truncate --size 1G foo
shred --iterations 1 foo

It's also more convenient because you can simply specify the size directly.

asynts
  • 168
5

Since, your goal is to create a 1GB file with random content, you could also use yes command instead of dd:

yes [text or string] | head -c [size of file] > [name of file]

Sample usage:

yes this is test file | head -c 100KB > test.file
4

If you want EXACTLY 1GB, then you can use the following:

openssl rand -out $testfile -base64 792917038; truncate -s-1 $testfile

The openssl command makes a file exactly 1 byte too big. The truncate command trims that byte off.

phuclv
  • 27,773
  • That extra byte is probably because of the -base64. Removing it will result in a file with the correct size. – Daniel Oct 10 '19 at 11:34
0

You could setup a bash-script for this like here. Get sure that you installed command 'pwgen' ... :

#!/bin/bash

touch cypher-01.txt

pwgen -sy 1024512 1024 >> cypher-01.txt

... this is finish and a file is created with randomly size of 1GB text. This takes on computer with 4 cores about ten minutes to fourteen minutes. To read this 1GB file with cat -A takes then about two and a half minutes.

  • Redirecting with >> or > will create the file if needed, touch is totally redundant. The whole idea of creating a script for the job seems like an unnecessary complication here, as the last command alone would suffice. – Kamil Maciorowski Nov 28 '23 at 23:56
-2

Try this script.

#!/bin/bash
openssl rand -base64 1000 | dd of=sample.txt bs=1G count=1

This script might work as long as you don't mind using /dev/random.

#!/bin/bash
dd if=/dev/random of="sample.txt bs=1G count=1"
KevinOrr
  • 115