Manipulating Tar Files
October 22 2018
We're such large proponents of archiving here at NCGAS, but sometimes it can be inconvenient to unpack a full project folder (100GB can take a while) just to get one file. Here's some tips on manipulating tar files that will help you manipulate your archive files.
NOTE: You will still need to pull the tar.gz file from the tape archive onto DC2 or your home directory or your personal computer (as space allows and likely using GLOBUS) before you can work with the compressed file. These commands won't work inside HSI, so you will still need to move them - the main benefit here is space and time!
To see what's in a tar without opening it:
tar tf file.tar.gz
This can be additionally useful for when you are wondering if a file is in the archived tar file. You can search for a filename (i.e. fileIneed.fa) by adding in a little grep:
less file.tar.gz | grep "fileIneed.fa"
tar tf file.tar.gz | grep "fileIneed.fa"
If your file is in file.tar.gz, it will list it in response to the above command. It may not be in there (or you have the name wrong) if nothing is returned.
Once you locate your files, it can be REALLY useful to remove one file from a tar:
tar xf file.tar.gz fileIneed.fa
There is one catch if you are working with a large archive - if you have a full set of folders archived and you want to grab the file in the archive, but it's in subfolders (i.e.: rawdata/sample2/file.fa), you can untar the file using:
tar xf file.tar.gz rawdata/sample2/file.fa
But... it will output the file to that path - making a rawdata/sample2/file.fa not just a file... And it will overwrite.
If you want to, you can use -C to direct it elsewhere:
mkdir testdir #must exist first
tar xC testdir f file.tar.gz rawdata/sample2/file.fa
This will result in a rawdata/sample2/file.fa tree being added to your testdir directory.
You can strip off the extra folders with --strip-component 1 (for to name removed), or --strip-component 2 (for to and next folder name removed).
tar xC testdir f file.tar.gz rawdata/sample2/file.fa --strip-component 2
#will just give you testdir/file.fa
Hopefully, that will help you wrangle your tar files more sanely (and without using a ton of space just to get one file)!