Compression: gzip vs bzip2 vs 7-zip

Today I had a look at the different options to compress files (in this case for backup purposes) on a Ubuntu system. The most common tools to compress files are gzip and bzip2. They have both been around for a long time, are available on most systems by default and are nicely integrated with other utilities like GNU tar (using its -z and -j options).

7-zip and the algorithm it uses (LZMA) is not that common on UNIX-like operating systems. It is well-known as a free alternative for WinZip on Windows systems and was started back in 1998. For Ubuntu p7zip – a port of 7-zip to POSIX – is available in universe (sudo apt-get install p7zip).

My test file was a MySQL dump with a size of 163 MB that contains mostly text. I was interested in the compressed file size and in the time it takes to compress and uncompress the file.

Here are the results:

Compressor Size Ratio Compression Decompression
gzip 89 MB 54 % 0m 13s 0m 05s
bzip2 81 MB 49 % 1m 30s 0m 20s
7-zip 61 MB 37 % 1m 48s 0m 11s

For the test I ran all tools with their default settings, i.e. without providing any special options.

Gzip is still a great tool and provides good compression without consuming a lot of computation power. Bzip2 is much slower and only provides slightly better compression. 7-zip consumes a bit more cycles than bzip2 but results in far smaller compressed files. Speed for decompression is even better for 7-zip than for bzip2.

So if time is important (think of on-the-fly compression) gzip is the tool of choice. If you don’t care too much about processing speed and need very good compression have a look at 7-zip. The only advantage bzip2 has over 7-zip is that bzip2 is part of most default installations and is more common. Let’s hope this will change in the future, especially integration with GNU tar would be great.


14 thoughts on “Compression: gzip vs bzip2 vs 7-zip

  1. I find the following settings to be optimal for text compression (specifically JavaDocs)…
    …it uses PPM rather than LZMA, which is better for text.

  2. I have been playing with paq8o8 for a time. It is known to be the very best compression software available (ie. It has won the Hutter price repeatedly), but the amounts of RAM and time it needs makes it unusable.

    If you are looking for the best entropy and memory/time is not an issue, then you can go crazy with something like a context mixer with lots of fileformat models hardcoded in assembly. It performs far better than any LZMA, LZ77, deflate or Huffman implementation. It wins.

    In my case, for my own uses a difference of +/- 15MB compressing a 2GB file is acceptable, so I believe that going mad for a tiny percentile of extra compression is useless. I’m using bzip2 now because the time/compression ratio is good. I’m not waiting hours for compression/decompression anymore

  3. One thing that you don’t mention is multithreading.  My understanding (and please, someone correct me if I’m wrong), is that bzip2 and 7-zip can take advantage of multiple cores (threads) simultaneously, whereas gzip cannot.  This effectively makes 7zip compression much faster (on multi-core systems), with a better compression than gzip or bzip2.

  4. I performed a similar test a while ago.

    I knew gZip would not compress as well, so i left it out of the test.

    I was rather surprised that Bzip2 came out ahead of 7-zip as far as both compression ratio and speed.

    About the DB: (table)
    Incrementing ID,
    Two VarChar columns containing 30, 50 characters, respectively.
    Int column. (Increments on update)
    A timestamp

    More than 50,000 Rows
    Table totaled around 10 MiB (In SQL Syntax)

    I figure it is highly dependent on the data provided.

  5. Yes, 7-zip takes advantage of multiple cores/cpus. For bzip2 and gzip there are variants of the original tools (pbzip2 and mgzip) that do the same.
    The test was performed on a single core machine to measure the processing power the tools consume and ignore the (lack of) support for parallel execution in the implementation of the algorithms.

  6. I tested p7zip vs bzip2 using the following parameters
    7za to-t7z-mx=9
    In a file tar mysql
    Original size 1083MB
    7za 120290 K
    bzip2 176182 K

  7. If you need speed, even better choice than gzip is LZO compression, and its tool lzop. It compress a much worse than gzip, but much faster. Good choice for backup compression on slow machines, where gzipping takes too long.
    Input: 274MB tar of source files.
    gzip: 62MB in 23seconds.
    lzop: 95MB in 3.5seconds. (limited by harddrive speed)
    Decompression is incredibly fast, in some cases even faster than memcpy. I often use lzo in embedded applications in bootloader to decompress program image (similar way like linux is compressed by gzip)

  8. in fact, it is not… I have a test sample of many stored HTML pages without any bitmap data, only HTMP text. It has about 1.2GB. Using normal settings of PPMd, resuilt is 51MB. Using LZMA, it is 11MB.

Leave a Reply

Your email address will not be published. Required fields are marked *