7

Gzipping a tar file as whole is drop dead easy and even implemented as option inside tar. So far, so good. However, from an archiver's point of view, it would be better to tar the gzipped single files. (The rationale behind it is, that data loss is minified, if there is a single corrupt gzipped file, than if your whole tarball is corrupted due to gzip or copy errors.)

Has anyone experience with this? Are there drawbacks? Are there more solid/tested solutions for this than

find folder -exec gzip '{}' \;
tar cf folder.tar folder
Boldewyn
  • 4,328
  • 5
  • 38
  • 52

4 Answers4

11

The key disadvantage is reduce compression, especially if your archive will contain many small files.

You might be better off compressing the data the usual way (or if you have CPU cycles to spare, the slower but more space efficient 7zip) then wrapping the result in a parity based fault-tolerant format such as http://en.wikipedia.org/wiki/Parchive. This will give you much greater potential for complete recovery after data corruption due to media failure or problems in transit over the network, possibly while not compromising too much on the size of the resulting archives.

David Spillett
  • 23,420
  • 1
  • 49
  • 69
  • Dang, beat me to it! +1 As modern compressors + forward error correction = better protected and most likely still smaller overall files than either way of using Tar + gzip. More info at http://www.par2.net – Mokubai May 19 '10 at 20:51
  • This really is the proper way to do things! `tar` create a big container for everything, `gzip` removes useless redundancy in the container, `par` adds back some redundancy but in a uniform and carefully-designed way. (I never used `par` before but I know the principle.) – user39559 Sep 09 '10 at 10:03
  • Compressing each file may be a big waste. It's like you break gzip when it was just starting to have effect, and you start it over for the next file. In an extreme test case, 100 replicas of a random (maximal-entropy) file will have compression factor smaller than 1 with gzip+tar but can have compress factor close to 100 with tar+gzip. – user39559 Sep 09 '10 at 10:04
4

If you're going to do it this way, then use the tried-and-true method:

zip -r folder.zip folder
Ignacio Vazquez-Abrams
  • 111,361
  • 10
  • 201
  • 247
  • I'm not sure, but is this not the same as .tar.gz? In other words, does zip compress the single files and add them in a simple concatenating way? My experiences with corrupted ZIP files so far were, that zip completely denies handling the archive (i.e., the same as a corrupt .tar.gz archive). – Boldewyn May 19 '10 at 20:40
  • @boldewyn: yes, thats how zip works. it is a container format (a bit like tar), where you can specify "storage" methods for the elements. either compress ("deflate") or just "store". – akira May 20 '10 at 06:31
0

Why not just toss the --verify (or -W) flag at tar? This will verify that the contents match the source.

Jack M.
  • 3,463
  • 2
  • 24
  • 23
  • ...and doesn't work with the `-z` or `-j` flag. Also, the verification *during* archiving doesn't help against corruption afterwards, e.g., a not noticed bit flip while copying to the backup device. – Boldewyn May 20 '10 at 07:19
0

What do you want to backup? If permission doesn't matter (e.g. not system files), I'd go with 7zip. Provides much better performace (multi-core/cpu) with much better compression.

Apache
  • 15,981
  • 25
  • 100
  • 152