Saturday, April 13, 2013

C# array compress and decompress with GZipStream

A small utility class for compressing and decompressing using the GZip algorithm that is built into the .NET framework class GZipStream.
The GZipStream class works with streams, so we need to create a stream to be able to use it. I.e. the GZipStream will write to a stream and also read from a stream.

Compress

For the Compress function we will create a MemoryStream and wrapping it with a using statement so that it is correctly disposed when it leaves scope. We nest a creation of a GZipStream and pass it our MemoryStream and set the compression level to optimal.
Available compression levels are

  • Optimal. Can take a little longer, but will compress the data as best as it can
  • Fastest. Will do a fast compression, the data might not be compressed fully, but sometimes this is good enough.
  • NoCompression. Will not compress, just encode it as a gzip
Then it is just to write the byte array to the GZipStream and call the ToArray() function on the MemoryStream to receive the underlying byte array that can be used.


public static class Compression
{
    public static byte[] Compress(byte[] data)
    {
        using (var ms = new MemoryStream())
        {
            using (var gzip = new GZipStream(ms, CompressionLevel.Optimal))
            {
                gzip.Write(data, 0, data.Length);
            }
            data = ms.ToArray();
        }
        return data;
    }
    public static byte[] Decompress(byte[] data)
    {
        // the trick is to read the last 4 bytes to get the length
        // gzip appends this to the array when compressing
        var lengthBuffer = new byte[4];
        Array.Copy(data, data.Length - 4, lengthBuffer, 0, 4);
        int uncompressedSize = BitConverter.ToInt32(lengthBuffer, 0);
        var buffer = new byte[uncompressedSize];
        using (var ms = new MemoryStream(data))
        {
            using (var gzip = new GZipStream(ms, CompressionMode.Decompress))
            {
                gzip.Read(buffer, 0, uncompressedSize);
            }
        }
        return buffer;
    }
}

Decompress

For the decompress part we will start with figuring out the length of the uncompressed size by reading the first four bytes of the input array and converting those to a integer.
After that we create our resulting output byte array to the uncompressed size.
And then again with the MemoryStream and GZipStream, this time we send in the Decompress enum member to specify that we want to decompress the stream.
Note that we call the MemoryStream constructor with the input byte array and that we use the GZipStream to read to the resulting buffer from position 0 to the uncompressed size of the data.

Hope this helps someone out there!


2 comments:

  1. Is there no way to toggle line numbers of or to do a clean copy?

    ReplyDelete
  2. Hi there,
    Thank you for your comment. Better late than sorry.
    I've started to cleanup all code posting on the blog and remove the annoying line numbers. Should be cleaner now.

    ReplyDelete