File Compression Guide

When sending files online good compression is essential for saving bandwidth, while many people these days have quite fast download speed, there are even more that have speeds below 8Mbps.

File compression consists of two parts, the archive and the compression algorithm, many archive formats support various compression algorithms, the most noticeable example of this is the .tar archive, when compressed it’s common practice to add the type of compression as a suffix, for example .tar.gz, .tar.bz2, other formats like .zip, .rar and .7z often specify a preferred compression method.

For this article I’m going to be using 7-zip which offers a variety of compression algorithms and archive types, it’s also completely free and open source.

Testing

This test will be done on three different types of files, the first being the nvidia driver installer (361.91-desktop-win10-64bit-international-whql.exe), the second being a PDF book and the third a large plain text file, this is important since the compression ratio depends on the file type, for instance installers are typically already compressed so I expect minimal compression there.

Files Uncompressed Size
Installer 321 MB (337,507,360 bytes)
 PDF Book 114 MB (120,225,893 bytes)
 Text File  9.13 MB (9,584,473 bytes)

For the first benchmark I will be compressing each with LZMA2 using the 7z archive which is the default and recommended for 7-zip, other options are at defaults, compression level normal, dictionary size 16MB, word size 32, solid block size 2GB, CPU threads 2.

Files Compressed Size Compression Ratio Compression Time
Installer  321 MB  100%  ~43 seconds
PDF  109 MB  95.6%  ~17 seconds
Text  1.40 MB  15.3%  ~4 seconds

As we can see from these results plain text has by far the best compression ratio, while the installer did not benefit at all, in some cases this may actually increase the size, the PDF had a reasonable improvement in size but this is dependent on how the PDF is compressed.

Now let’s try again but with the compression level set to ultra.

Files Compressed Size Compression Ratio Compression Time
Installer
PDF 107 MB 93.8% ~26 Seconds
Text 1.39 MB 15.2%  ~4 seconds

The results of this are rather interesting, the installer caused 7-zip to freeze on ultra so I was unable to see if there is any compression, the PDF shows a reasonable gain at the cost of compression time while the text file remains mostly the same.

Compression level isn’t the only thing you can tweak, dictionary size can have a major effect on the compression ratio but also enormously increases the memory requirement for compression and decompression, the default 16MB is rather conservative, ultra defaults to 64MB which is much better but you can get a little more by increasing it, generally above 128MB gives minimal gain.

This test is a little unrealistic as often you will be compressing many files, let’s try a mix of different file types with an uncompressed size of 132MB

Compression Compressed Size Compression Ratio Compression Time
Default  117 MB  88.6%  ~9 seconds
Ultra  90 MB  68.18%  ~24 seconds
Ultra + 128MB Dict  89.7 MB  67.95%  ~22 seconds

I was a little surprised by these results that a larger dictionary size actually took less time, it really goes to show that the types of files determine how far you can compress more than anything else.

Conclusion

I was expecting more definitive results as to what is better but as these tests show it varies on a case by case basis, I would certainly recommend you stick to LZMA2 as various benchmarks by many people have shown it to be the best in terms of compression ratio, memory and for the most part compression time, things like .zip with deflate (I.E winzip) should be avoided these days.

If you really need good compression then the only true way to do it is to test various settings for what you are trying to compress.

For things like video, audio and images, compression isn’t really the answer, using a different format or codec is the way to go since compression can only go so far.