<< The Advantage of Being Non-Agile | Home | Advertising on WHOIS? >>

Compression: gzip vs bzip2 vs 7-zip

A trade-off between time and space

Today I had a look at the different options to compress files (in this case for backup purposes) on a Ubuntu system. The most common tools to compress files are gzip and bzip2. They have both been around for a long time, are available on most systems by default and are nicely integrated with other utilities like GNU tar (using its -z and -j options).
7-zip and the algorithm it uses (LZMA) is not that common on UNIX-like operating systems. It is well-known as a free alternative for WinZip on Windows systems and was started back in 1998. For Ubuntu p7zip – a port of 7-zip to POSIX – is available in universe (sudo apt-get install p7zip).

My test file was a MySQL dump with a size of 163 MB that contains mostly text. I was interested in the compressed file size and in the time it takes to compress and uncompress the file.

Here are the results:

Compressor Size Ratio Compression Decompression
gzip 89 MB 54 % 0m 13s 0m 05s
bzip2 81 MB 49 % 1m 30s 0m 20s
7-zip 61 MB 37 % 1m 48s 0m 11s

For the test I ran all tools with their default settings, i.e. without providing any special options.

Gzip is still a great tool and provides good compression without consuming a lot of computation power. Bzip2 is much slower and only provides slightly better compression. 7-zip consumes a bit more cycles than bzip2 but results in far smaller compressed files. Speed for decompression is even better for 7-zip than for bzip2.

So if time is important (think of on-the-fly compression) gzip is the tool of choice. If you don't care too much about processing speed and need very good compression have a look at 7-zip. The only advantage bzip2 has over 7-zip is that bzip2 is part of most default installations and is more common. Let's hope this will change in the future, especially integration with GNU tar would be great.

References



Re: Compression: gzip vs bzip2 vs 7-zip

While your results are correct, you can't generalize them: it depends on the filetype. 7zip won't always be better (though it will be most of the time).

Re: Compression: gzip vs bzip2 vs 7-zip

What about the gzip compression level?  Did you do this with gzip -9  or using the default (which is 6)?

Re: Compression: gzip vs bzip2 vs 7-zip

I've used all three tools with their default settings.

Re: Compression: gzip vs bzip2 vs 7-zip

I find the following settings to be optimal for text compression (specifically JavaDocs)...
-m0=PPMd:mem30:o32
...it uses PPM rather than LZMA, which is better for text.

Re: Compression: gzip vs bzip2 vs 7-zip

in fact, it is not... I have a test sample of many stored HTML pages without any bitmap data, only HTMP text. It has about 1.2GB. Using normal settings of PPMd, resuilt is 51MB. Using LZMA, it is 11MB.

Re: Compression: gzip vs bzip2 vs 7-zip

I have been playing with paq8o8 for a time. It is known to be the very best compression software available (ie. It has won the Hutter price repeatedly), but the amounts of RAM and time it needs makes it unusable.

If you are looking for the best entropy and memory/time is not an issue, then you can go crazy with something like a context mixer with lots of fileformat models hardcoded in assembly. It performs far better than any LZMA, LZ77, deflate or Huffman implementation. It wins.

In my case, for my own uses a difference of +/- 15MB compressing a 2GB file is acceptable, so I believe that going mad for a tiny percentile of extra compression is useless. I'm using bzip2 now because the time/compression ratio is good. I'm not waiting hours for compression/decompression anymore

Re: Compression: gzip vs bzip2 vs 7-zip

Nice test, but there are more extensive benchmarks out there. I admit it's mainly windows based, but you will find a lot of *nix type compressors like gzip, bzip, bzip2, 7-zip, Stuffit etc. Enjoy.

Re: Compression: gzip vs bzip2 vs 7-zip

Thanks for the link to your impressive website, Werner.

Re: Compression: gzip vs bzip2 vs 7-zip

One thing that you don't mention is multithreading.  My understanding (and please, someone correct me if I'm wrong), is that bzip2 and 7-zip can take advantage of multiple cores (threads) simultaneously, whereas gzip cannot.  This effectively makes 7zip compression much faster (on multi-core systems), with a better compression than gzip or bzip2.

Re: Compression: gzip vs bzip2 vs 7-zip

Yes, 7-zip takes advantage of multiple cores/cpus. For bzip2 and gzip there are variants of the original tools (pbzip2 and mgzip) that do the same.
The test was performed on a single core machine to measure the processing power the tools consume and ignore the (lack of) support for parallel execution in the implementation of the algorithms.

Re: Compression: gzip vs bzip2 vs 7-zip

I performed a similar test a while ago.

I knew gZip would not compress as well, so i left it out of the test.

I was rather surprised that Bzip2 came out ahead of 7-zip as far as both compression ratio and speed.

About the DB: (table)
Incrementing ID,
Two VarChar columns containing 30, 50 characters, respectively.
Int column. (Increments on update)
A timestamp

More than 50,000 Rows
Table totaled around 10 MiB (In SQL Syntax)

I figure it is highly dependent on the data provided.

Re: Compression: gzip vs bzip2 vs 7-zip

Yes, indeed the best advise to everybody looking for a good compression solution is to measure with their own set of data.

Re: Compression: gzip vs bzip2 vs 7-zip

I tested p7zip vs bzip2 using the following parameters
7za to-t7z-mx=9
bzip2-9 
In a file tar mysql
Original size 1083MB
7za 120290 K
bzip2 176182 K

Re: Compression: gzip vs bzip2 vs 7-zip

If you need speed, even better choice than gzip is LZO compression, and its tool lzop. It compress a much worse than gzip, but much faster. Good choice for backup compression on slow machines, where gzipping takes too long.
Input: 274MB tar of source files.
gzip: 62MB in 23seconds.
lzop: 95MB in 3.5seconds. (limited by harddrive speed)
Decompression is incredibly fast, in some cases even faster than memcpy. I often use lzo in embedded applications in bootloader to decompress program image (similar way like linux is compressed by gzip)

Der beste Packer zur Komprimierung - Klimaschutz auf kernel.org

Verschiedene Packer und Optionen im Vergleich - wann macht welcher Packer Sinn? Und wie kann man damit die Umwelt schuetzen?

Add a comment Send a TrackBack