1

I have a very large file (999,952,379,904 bytes to be precise), which is a partial disk image. Looking at it in a hex editor I have found that the structure is as follows:

  • 0-2073 byte offsets contains some header information added by the tool which created the file. I can ignore them.

  • 2074 bytes onward are blocks of 1048580 bytes comprising 1048576 of data followed by a 4 byte CRC value.

I'm looking for an efficient way to start from an offset of 2074 bytes and copy the 1048580 byte blocks excluding the 4 byte CRC values up to the end of the input file. It looks like the file doesn't end with a complete block, so I would either exclude the last block or pad it with zeroes.

dd can clearly accommodate starting from an offset, but is there any way to exclude the last 4 bytes in every block when copying?

Tinkerer
  • 118

1 Answers1

2

You would probably have to create bash loop and have dd skip the required bytes for each block.

Writing a simple C program is easier.

$ cat >cvt.c
#include <unistd.h>
#include <string.h>

#define SKIPBYTES (2074)

#define BUFSIZE (1048580)
#define STRIPBYTES (4)

void main()
{
  char buf[BUFSIZE];                /* buffer to hold one block of data to tranfser */
  ssize_t count=0;

  read(0,buf,SKIPBYTES);            /* read initial data to skip */

  while (1)
    {
    memset( (void *)buf,0,BUFSIZE); /* fill with zero-bytes */
    count=read(0,buf,BUFSIZE);         /* possibly read a full buffer */
    if (count>0)
      write(1,buf,BUFSIZE-STRIPBYTES); /* write almost all bytes */
    else
      break;
    };

}

Press and hold CTRL and hit d once.

$ gcc -o cvt cvt.c
$ chmod 755 cvt
$ ./cvt <largefile.raw >filtered-file.dd

Note, "fd" 0 is stdin, 1 is stdout, 2 is stderr
Check:
$ man read
$ man 2 write

Hannu
  • 9,289