<

In-Memory Compression and Decompression Using ZLIB

Published on
29,692 Points
19,992 Views
2 Endorsements
Last Modified:
Awarded
DanRollins
zlib is a free compression library (a DLL) on which the popular gzip utility is built.  In this article, we'll see how to use the zlib functions to compress and decompress data in memory; that is, without needing to use a temporary file.  We'll be coding in C++ in Visual Studio.

This seems like it would be a trivial problem, and indeed, it turns out to be relatively easy.  The difficulty is in wading through the zlib documentation and figuring out what not to do!  As with most GNU projects, the documentation is written in some sort of unix-geek-speak, and for some reason, they included lots of special options and rarely-needed functionality, but failed to provide a simple function call to handle this most basic of needs.

To get started, you will need to download the pre-built zlib DLL.  The zlib main page is here:
       http://www.winimage.com/zLibDll/index.html
and the direct link ot the ZIP file containing the DLL is here:
       http://www.winimage.com/zLibDll/zlib125dll.zip

It would be way too easy for us if the zlib geniuses had provided the needed C++ header files with the download, so you also need to download the source for the entire library (if I may make an editorial comment here: Sheesh!).  Get the ZIP file of the entire library source code here:
       http://zlib.net/zlib125.zip
We won't be building the library, but you will need to get the two files:
       zlib.h
and
       zconf.h
and put them into your project directory.

The Code

To make this example be as simple and universal as possible, we'll just throw together a console application program.  Use all of the Visual Studio App Wizard defaults, and then copy the following into the main (only) CPP file:
#include "stdafx.h"   // not actually needed
#define ZLIB_WINAPI   // actually actually needed (for linkage)

#include "windows.h"  // get BYTE et al.
#include "zlib.h"     // declare the external fns -- uses zconf.h, too
#pragma comment(lib, "zlibwapi.lib") // for access to the DLL

int GetMaxCompressedLen( int nLenSrc ) 
{
    int n16kBlocks = (nLenSrc+16383) / 16384; // round up any fraction of a block
    return ( nLenSrc + 6 + (n16kBlocks*5) );
}
int CompressData( const BYTE* abSrc, int nLenSrc, BYTE* abDst, int nLenDst )
{
    z_stream zInfo ={0};
    zInfo.total_in=  zInfo.avail_in=  nLenSrc;
    zInfo.total_out= zInfo.avail_out= nLenDst;
    zInfo.next_in= (BYTE*)abSrc;
    zInfo.next_out= abDst;

    int nErr, nRet= -1;
    nErr= deflateInit( &zInfo, Z_DEFAULT_COMPRESSION ); // zlib function
    if ( nErr == Z_OK ) {
        nErr= deflate( &zInfo, Z_FINISH );              // zlib function
        if ( nErr == Z_STREAM_END ) {
            nRet= zInfo.total_out;
        }
    }
    deflateEnd( &zInfo );    // zlib function
    return( nRet );
}

int UncompressData( const BYTE* abSrc, int nLenSrc, BYTE* abDst, int nLenDst )
{
    z_stream zInfo ={0};
    zInfo.total_in=  zInfo.avail_in=  nLenSrc;
    zInfo.total_out= zInfo.avail_out= nLenDst;
    zInfo.next_in= (BYTE*)abSrc;
    zInfo.next_out= abDst;

    int nErr, nRet= -1;
    nErr= inflateInit( &zInfo );               // zlib function
    if ( nErr == Z_OK ) {
        nErr= inflate( &zInfo, Z_FINISH );     // zlib function
        if ( nErr == Z_STREAM_END ) {
            nRet= zInfo.total_out;
        }
    }
    inflateEnd( &zInfo );   // zlib function
    return( nRet ); // -1 or len of output
}

Open in new window

This defines three functions that simplify using the library for in-memory operation:

GetMaxCompressedLen -- Pass in the length of the source buffer and it returns the maximum size needed for the compressed output.  

It should not be too shocking to learn that after running a lossless compression function, it is possible for the output to be larger than the input.  Compression algorithms work by finding common repeated sequences.  If there are no (or very few) repetitions, as in JPG files and encrypted files, then the algorithm can't find any air to squeeze out.  There is a minor amount of overhead (6 bytes overall and 5 bytes per 16K block, as documented on the website), and this function takes that into consideration.
CompressData -- Pass in the address of the source (uncompressed) buffer and its length and provide an output buffer and its length.  Use GetMaxCompressedLen when setting up these final two parameters.
UncompressData -- Pass in the address of the source buffer (the compressed data) and its length and provide an output buffer and its length.  The output buffer needs to be large enough to hold the uncompressed output.

You should save the original data length (before compression) and use that in preparing the last two parameters. In this simplified example, I don't provide any means to estimate the output length (Note: The zlib algorithm can get compression as high as 1000-to-1 in certain extreme cases).

Now add the main() function in order to test the compression functions:
int main()
{
    BYTE pbSrc[]="hello hello hello hello there";

    //-------------- compress (save the original length)

    int nLenOrig= strlen( (char*)pbSrc )+1; // include terminating NULL
    int nLenDst= GetMaxCompressedLen( nLenOrig );
    BYTE* pbDst= new BYTE [nLenDst];  // alloc dest buffer

    int nLenPacked= CompressData( pbSrc, nLenOrig, pbDst, nLenDst );
    if ( nLenPacked == -1 ) return(1);  // error

    //-------------- uncompress (uses the saved original length)

    BYTE* pbPacked=   pbDst;
    BYTE* pbUnpacked= new BYTE[ nLenOrig ];

    int nLen= UncompressData( pbPacked, nLenPacked, pbUnpacked, nLenOrig );

    // breakpoint here and view pbUnpacked to confirm
    delete pbDst;            // do some cleanup
    delete pbUnpacked;
    return 0; 
}

Open in new window


Summary

Programmers like zlib because it is unencumbered by patents and there are no royalties to pay.  The source code is freely available.  You can use the DLL as shown here, or link into the static library version to avoid needing to distribute another executable file.  The "deflate" (Huffman/LZ77) algorithm gets quite good compression, certainly adequate for most needs.

zlib is designed to use data streams that nearly always come from and go to disk files.  That's what makes the documentation seem complicated.  For in-memory packing and unpacking, all we needed to do was ignore all of the stream handling logic and set up the z_stream structure so that it would do everything in one go.

There is a lot more you can do with zlib, for instance, you can provide special handling to optimize the output, including selecting a compression strategy (Huffman-only or RLE-only) and you can even set up a compression dictionary in advance -- you might be able to out-perform the on-the-fly algorithms if you know you will be compressing certain specific types of data.

As I said earlier, the library contains many bells and whistles, including a complete set of file handling functions and other things that seem (to me) to be superfluous.  I hope that by providing a cut-to-the-bone example, I've made this tool a little easier for you to use.

References:

zlib Home page:
    http://www.zlib.net/index.html
zlib FAQ
    http://www.zlib.net/zlib_faq.html
zlib programming manual
    http://www.zlib.net/manual.html
gzip Home page
    http://www.gzip.org/index.html

If you want to build the zlib library (e.g., to prune out unwanted functionality), note that there are two VS projects, but you need to look for them:
    (unzip folder)\zlib-1.2.5\contrib\vstudio

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
If you liked this article and want to see more from this author, please click the Yes button near the:
      Was this article helpful?
label that is just below and to the right of this text.   Thanks!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
2
Comment
Author:DanRollins
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
5 Comments
 

Expert Comment

by:marikcool
I taked this sample code by guru, hoping that there will be errors¿ and detect memory leak.
just swap deflateEnd( &zInfo ) and deflateEnd( &zInfo ) to fix it. rtfm, bye.
0
 
LVL 49

Author Comment

by:DanRollins
Thank you for your feedback!   I have modified the code in the CompressData() and UncompressData() functions so that the correct XxxxxEnd call is made.

-- Dan
0
 
LVL 70

Expert Comment

by:Qlemo
Interesting. Swapping  deflateEnd( &zInfo )   and   deflateEnd( &zInfo )  does not change anything :p
0
 

Expert Comment

by:marikcool
Qlermo inflateEnd and  deflateEnd
0
 
LVL 70

Expert Comment

by:Qlemo
I know. I just expect someone posting with "rtfm" to take more care of being precise and correct.
0

Featured Post

Enroll in May's Course of the Month

May’s Course of the Month is now available! Experts Exchange’s Premium Members and Team Accounts have access to a complimentary course each month as part of their membership—an extra way to increase training and boost professional development.

Join & Write a Comment

This is Part 3 in a 3-part series on Experts Exchange to discuss error handling in VBA code written for Excel. Part 1 of this series discussed basic error handling code using VBA. http://www.experts-exchange.com/videos/1478/Excel-Error-Handlin…

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month