0

I have files for a mod for a game. This mod requires some music files to be present twice in different folders. Given that the music is the same in both folders, is there a way to zip the files once, then change the table of content and reference to the other copy, such that if I extract the zip file, it extracts the files twice, but they are actually just once in the zip?

Similar as to creating an ISO with a modified TOC (though I don't know how to do that either)

An example of what the zip would have:

mod.zip
\music\set_a\tune1.mp3
\music\set_a\tune2.mp3
\music\set_a\tune3.mp3
\music\set_a\tune4.mp3
\music\set_a\tune5.mp3
\music\set_a\tune6.mp3
\music\set_b\tune1.mp3
\music\set_b\tune2.mp3
\music\set_b\tune3.mp3
\music\set_b\tune4.mp3
\music\set_b\tune5.mp3
\music\set_b\tune6.mp3
\graphics\set_a\img1.png
\graphics\set_a\img2.png
\graphics\set_b\img1.png
\graphics\set_b\img2.png

Imagine that the tunes for set_a and set_b are identical, the graphics for set_a and set_b are not.

In an ideal world, I would replace all mp3 files in set_b with 0 length files, then after I created the zip file, I would alter the index and make it refer to the set_a data, so that upon extracting, it creates music\set_b\tune1.mp3, but uses the data of music\set_a.

Is that possible? If not, any other easy way to create something similar?

Attie
  • 19,231
  • 5
  • 58
  • 73
LPChip
  • 59,229
  • 10
  • 98
  • 140
  • Does it specifically have to be a .zip archive, as opposed to (for example) .rar or .tar.gz? – u1686_grawity Apr 11 '20 at 20:44
  • @user1686 preferably, but if you have an answer that involves another compressed archive, I'm all ears. – LPChip Apr 11 '20 at 20:47
  • 1
    Would symlinks in a tarball be acceptable? (guessing not) – Attie Apr 11 '20 at 21:14
  • Hardlinks probably would. – u1686_grawity Apr 11 '20 at 21:15
  • @user1686 quite right (I had expected no)... producing a tarball with "_one 1MiB file_" is the same size as a tarball with "_one 1MiB file + a hardlink to it_"... a tarball with "_two identical 1MiB files_" is larger. – Attie Apr 11 '20 at 21:18
  • 1
    Almost any achive format *other than bog standard zip* will do this for you for free, by default. Most "solid" archive formats group together files of the same type and name and will effectively end up with the compression set for two identical files being repeated and saving if not the entire file, then a very significant amount far in excess of what zip would achieve. As stated by user1686 it is far better than manually abusing a file format into something that may or may not be supported by any given extractor. My personal favourite is, as already suggested by user1686, 7-zip. – Mokubai Apr 11 '20 at 21:50
  • @Mokubai I tried using .zip which compressed 60mb into 59mb. I then tried 7z which made it 57mb. Unless I'm missing something with the compression options, 7zip does not seem to do it. – LPChip Apr 11 '20 at 22:52

2 Answers2

3

Probably a simple alternative is to use a "solid" archive format. This is always how .tar.foo archives work, and is a selectable option for .rar, and .7z formats.

In this mode, the archive's contents are concatenated together and compressed as a single continuous stream, meaning that repetitions will be detected across files as well – and identical files should get deduplicated as part of the regular compression.

(The downsides of this mode are that it makes extracting individual files slow and the archive cannot be updated easily.)

Note: This other thread (which was closed) has answers saying that this only works with relatively small amounts of data compared to dictionary size parameter. But at least it's less risky than making nonstandard changes to the already-horrible .zip structure.

u1686_grawity
  • 426,297
  • 64
  • 894
  • 966
  • Can you tell me how I can create a solid .7z archive? I don't mind setting the compression to ultra etc... In my experiments so far, which has mp3's of around 5mb, the compression does not pick it up as identical files and a 60mb folder is compressed to a 57mb zip file. – LPChip Apr 11 '20 at 22:56
  • 1
    EDIT: Never mind, I set the word dictionary size to 60mb, as I need 30mb, and it now compressed it to 27mb, the size I expected it to be. :D awesome. – LPChip Apr 11 '20 at 22:58
2

zpaq does this for you, it has built in deduplication, it is open source and it runs at least on Windows and Linux (probably already packed).

This is a quick check on Linux:

$ dd if=/dev/urandom bs=1M of=file1 count=10
$ cp file1 file2
$ zpaq add archive.zpaq file1 file2
$ ls -lh archive.zpaq 

See the size of the archive. Note also that we did not provide any information about the duplication of the files, no soft/hard links.

$ rm file1 file2
$ zpaq extract archive.zpaq 
$ ls -lh file1 file2