User Tools

Site Tools


A PCRE internal error occured. This might be caused by a faulty plugin
gtspring2009:howto:compress

====== How to compress a data file ====== {{gtspring2009:dbthumb.png?24 }} How does one compress a large number of small files? I have over a terrabyte of data that is currently stored as small matlab .mat files (~750 KB each) in a hierarchy of directories (the top level directories contain several GB of data which are then divided among a few hundred subdirectories that vary in size from a couple of MB to a few hundred MB). I'm considering compressing the data, but I'm not 100% what the best way to do that is for such a large structure. Also, does anybody have any suggestions as to which compression format will work best? --- //[[dborrero@gatech.edu|D. Borrero]] 2009-07-08 13:29// {{gtspring2009:gibson.png?24 }} Aren't Matlab .mat files binary floating-point data and thus already compressed? If there is room to be gained from compression (try gzipping an individual file) I would suggest one of two alternatives: - Compress the entire directory structure with "''tar cvfpz bigdir.tgz bigdir''" where bigdir is the name of the toplevel directory. That will compress everything into one big tarfile, which you then list contents/extract with "''tar tvfpz bigdir.tgz''" or "''tar xvfpz bigdir.tgz''". - Compress files individually with "''gzip -r .''". That'll do a recursive descent into the current directory and compress all files within. Alternatively you could do "''find . -name '*.mat' -exec gzip {} \;''". You can use ''bzip2'' instead of ''gzip'' in the latter commands (''j'' in place of ''z'' in the tar commands); ''bzip2'' is supposed to give better compression but it doesn't always.

gtspring2009/howto/compress.txt · Last modified: 2010/02/02 07:55 (external edit)