Compression: gzip vs bzip2 vs 7-zip
A trade-off between time and space
Today I had a look at the different options to compress files (in this case for backup purposes) on a Ubuntu system. The most common tools to compress files are gzip and bzip2. They have both been around for a long time, are available on most systems by default and are nicely integrated with other utilities like GNU tar (using its -z and -j options).
7-zip and the algorithm it uses (LZMA) is not that common on UNIX-like operating systems. It is well-known as a free alternative for WinZip on Windows systems and was started back in 1998. For Ubuntu p7zip – a port of 7-zip to POSIX – is available in universe (sudo apt-get install p7zip).
My test file was a MySQL dump with a size of 163 MB that contains mostly text. I was interested in the compressed file size and in the time it takes to compress and uncompress the file.
Here are the results:
| Compressor | Size | Ratio | Compression | Decompression |
|---|---|---|---|---|
| gzip | 89 MB | 54 % | 0m 13s | 0m 05s |
| bzip2 | 81 MB | 49 % | 1m 30s | 0m 20s |
| 7-zip | 61 MB | 37 % | 1m 48s | 0m 11s |
For the test I ran all tools with their default settings, i.e. without providing any special options.
Gzip is still a great tool and provides good compression without consuming a lot of computation power. Bzip2 is much slower and only provides slightly better compression. 7-zip consumes a bit more cycles than bzip2 but results in far smaller compressed files. Speed for decompression is even better for 7-zip than for bzip2.
So if time is important (think of on-the-fly compression) gzip is the tool of choice. If you don't care too much about processing speed and need very good compression have a look at 7-zip. The only advantage bzip2 has over 7-zip is that bzip2 is part of most default installations and is more common. Let's hope this will change in the future, especially integration with GNU tar would be great.
References
Re: Compression: gzip vs bzip2 vs 7-zip
I have been playing with paq8o8 for a time. It is known to be the very best compression software available (ie. It has won the Hutter price repeatedly), but the amounts of RAM and time it needs makes it unusable.
If you are looking for the best entropy and memory/time is not an issue, then you can go crazy with something like a context mixer with lots of fileformat models hardcoded in assembly. It performs far better than any LZMA, LZ77, deflate or Huffman implementation. It wins.
In my case, for my own uses a difference of +/- 15MB compressing a 2GB file is acceptable, so I believe that going mad for a tiny percentile of extra compression is useless. I'm using bzip2 now because the time/compression ratio is good. I'm not waiting hours for compression/decompression anymore
Re: Compression: gzip vs bzip2 vs 7-zip
Re: Compression: gzip vs bzip2 vs 7-zip
Re: Compression: gzip vs bzip2 vs 7-zip
The test was performed on a single core machine to measure the processing power the tools consume and ignore the (lack of) support for parallel execution in the implementation of the algorithms.
Re: Compression: gzip vs bzip2 vs 7-zip
I knew gZip would not compress as well, so i left it out of the test.
I was rather surprised that Bzip2 came out ahead of 7-zip as far as both compression ratio and speed.
About the DB: (table)
Incrementing ID,
Two VarChar columns containing 30, 50 characters, respectively.
Int column. (Increments on update)
A timestamp
More than 50,000 Rows
Table totaled around 10 MiB (In SQL Syntax)
I figure it is highly dependent on the data provided.
Re: Compression: gzip vs bzip2 vs 7-zip
Input: 274MB tar of source files.
gzip: 62MB in 23seconds.
lzop: 95MB in 3.5seconds. (limited by harddrive speed)
Decompression is incredibly fast, in some cases even faster than memcpy. I often use lzo in embedded applications in bootloader to decompress program image (similar way like linux is compressed by gzip)