I have an incredibly large tarball. I'd extract several files out of many thousands within the archive. I'm on CentOS 6.10 running GPFS 4.2.3. I've seen from this answer that pigz is useful in extracting the entire tar.ball. Extracting the entire tar ball is not useful because it will take up terabytes with of space.
I've tried something like :
$ pigz -dc ../test.tar.gz | tar xf test/analysis/something/dist.txt
tar: test/analysis/something/dist.txt: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now
I'm not exactly sure how to pass test/analysis/something/dist.txt as an argument to tar in the context of piping the output of pigz. My intuition says to use xargs, but that fails also.
$ pigz -dc ../test.tar.gz | xargs -I var | tar xf var test/analysis/something/dist.txt
tar: var: Cannot openxargs: Warning: a NUL character occurred in the input. It cannot be passed through in the argument list. Did you mean to use the --null option?
: No such file or directory
tar: Error is not recoverable: exiting now
xargs: /bin/echo: terminated by signal 13
QUESTION
- How do I quickly extract a single file from a large tarball using
pigz?
However, basically you are decompressing the file into your memory, so unless you have terabytes of memory available, it will fill up even faster than decompressing to disk. It will not. Both gzip/pigz and tar are perfectly able to operate on streaming data and neither will consume lot of memory. Pigz will feed pipe and if tar won't be able to take it fast enough it will wait for it. Similarly tar will read archive and discard everything that is not ask for (here: dist.txt); then it will write it to drive. That's it. – tansy Jul 08 '22 at 15:50