Community Pick: Many members of our community have endorsed this article.
Editor's Choice: This article has been selected by our editors as an exceptional contribution.

In-Memory Compression and Decompression Using ZLIB

DanRollins
CERTIFIED EXPERT
Published:
Updated:
zlib is a free compression library (a DLL) on which the popular gzip utility is built.  In this article, we'll see how to use the zlib functions to compress and decompress data in memory; that is, without needing to use a temporary file.  We'll be coding in C++ in Visual Studio.

This seems like it would be a trivial problem, and indeed, it turns out to be relatively easy.  The difficulty is in wading through the zlib documentation and figuring out what not to do!  As with most GNU projects, the documentation is written in some sort of unix-geek-speak, and for some reason, they included lots of special options and rarely-needed functionality, but failed to provide a simple function call to handle this most basic of needs.

To get started, you will need to download the pre-built zlib DLL.  The zlib main page is here:
       http://www.winimage.com/zLibDll/index.html
and the direct link ot the ZIP file containing the DLL is here:
       http://www.winimage.com/zLibDll/zlib125dll.zip

It would be way too easy for us if the zlib geniuses had provided the needed C++ header files with the download, so you also need to download the source for the entire library (if I may make an editorial comment here: Sheesh!).  Get the ZIP file of the entire library source code here:
       http://zlib.net/zlib125.zip
We won't be building the library, but you will need to get the two files:
       zlib.h
and
       zconf.h
and put them into your project directory.

The Code

To make this example be as simple and universal as possible, we'll just throw together a console application program.  Use all of the Visual Studio App Wizard defaults, and then copy the following into the main (only) CPP file:
#include "stdafx.h"   // not actually needed
                      #define ZLIB_WINAPI   // actually actually needed (for linkage)
                      
                      #include "windows.h"  // get BYTE et al.
                      #include "zlib.h"     // declare the external fns -- uses zconf.h, too
                      #pragma comment(lib, "zlibwapi.lib") // for access to the DLL
                      
                      int GetMaxCompressedLen( int nLenSrc ) 
                      {
                          int n16kBlocks = (nLenSrc+16383) / 16384; // round up any fraction of a block
                          return ( nLenSrc + 6 + (n16kBlocks*5) );
                      }
                      int CompressData( const BYTE* abSrc, int nLenSrc, BYTE* abDst, int nLenDst )
                      {
                          z_stream zInfo ={0};
                          zInfo.total_in=  zInfo.avail_in=  nLenSrc;
                          zInfo.total_out= zInfo.avail_out= nLenDst;
                          zInfo.next_in= (BYTE*)abSrc;
                          zInfo.next_out= abDst;
                      
                          int nErr, nRet= -1;
                          nErr= deflateInit( &zInfo, Z_DEFAULT_COMPRESSION ); // zlib function
                          if ( nErr == Z_OK ) {
                              nErr= deflate( &zInfo, Z_FINISH );              // zlib function
                              if ( nErr == Z_STREAM_END ) {
                                  nRet= zInfo.total_out;
                              }
                          }
                          deflateEnd( &zInfo );    // zlib function
                          return( nRet );
                      }
                      
                      int UncompressData( const BYTE* abSrc, int nLenSrc, BYTE* abDst, int nLenDst )
                      {
                          z_stream zInfo ={0};
                          zInfo.total_in=  zInfo.avail_in=  nLenSrc;
                          zInfo.total_out= zInfo.avail_out= nLenDst;
                          zInfo.next_in= (BYTE*)abSrc;
                          zInfo.next_out= abDst;
                      
                          int nErr, nRet= -1;
                          nErr= inflateInit( &zInfo );               // zlib function
                          if ( nErr == Z_OK ) {
                              nErr= inflate( &zInfo, Z_FINISH );     // zlib function
                              if ( nErr == Z_STREAM_END ) {
                                  nRet= zInfo.total_out;
                              }
                          }
                          inflateEnd( &zInfo );   // zlib function
                          return( nRet ); // -1 or len of output
                      }

Open in new window

This defines three functions that simplify using the library for in-memory operation:

GetMaxCompressedLen -- Pass in the length of the source buffer and it returns the maximum size needed for the compressed output.  

It should not be too shocking to learn that after running a lossless compression function, it is possible for the output to be larger than the input.  Compression algorithms work by finding common repeated sequences.  If there are no (or very few) repetitions, as in JPG files and encrypted files, then the algorithm can't find any air to squeeze out.  There is a minor amount of overhead (6 bytes overall and 5 bytes per 16K block, as documented on the website), and this function takes that into consideration.
CompressData -- Pass in the address of the source (uncompressed) buffer and its length and provide an output buffer and its length.  Use GetMaxCompressedLen when setting up these final two parameters.
UncompressData -- Pass in the address of the source buffer (the compressed data) and its length and provide an output buffer and its length.  The output buffer needs to be large enough to hold the uncompressed output.

You should save the original data length (before compression) and use that in preparing the last two parameters. In this simplified example, I don't provide any means to estimate the output length (Note: The zlib algorithm can get compression as high as 1000-to-1 in certain extreme cases).

Now add the main() function in order to test the compression functions:
int main()
                      {
                          BYTE pbSrc[]="hello hello hello hello there";
                      
                          //-------------- compress (save the original length)
                      
                          int nLenOrig= strlen( (char*)pbSrc )+1; // include terminating NULL
                          int nLenDst= GetMaxCompressedLen( nLenOrig );
                          BYTE* pbDst= new BYTE [nLenDst];  // alloc dest buffer
                      
                          int nLenPacked= CompressData( pbSrc, nLenOrig, pbDst, nLenDst );
                          if ( nLenPacked == -1 ) return(1);  // error
                      
                          //-------------- uncompress (uses the saved original length)
                      
                          BYTE* pbPacked=   pbDst;
                          BYTE* pbUnpacked= new BYTE[ nLenOrig ];
                      
                          int nLen= UncompressData( pbPacked, nLenPacked, pbUnpacked, nLenOrig );
                      
                          // breakpoint here and view pbUnpacked to confirm
                          delete pbDst;            // do some cleanup
                          delete pbUnpacked;
                          return 0; 
                      }

Open in new window


Summary

Programmers like zlib because it is unencumbered by patents and there are no royalties to pay.  The source code is freely available.  You can use the DLL as shown here, or link into the static library version to avoid needing to distribute another executable file.  The "deflate" (Huffman/LZ77) algorithm gets quite good compression, certainly adequate for most needs.

zlib is designed to use data streams that nearly always come from and go to disk files.  That's what makes the documentation seem complicated.  For in-memory packing and unpacking, all we needed to do was ignore all of the stream handling logic and set up the z_stream structure so that it would do everything in one go.

There is a lot more you can do with zlib, for instance, you can provide special handling to optimize the output, including selecting a compression strategy (Huffman-only or RLE-only) and you can even set up a compression dictionary in advance -- you might be able to out-perform the on-the-fly algorithms if you know you will be compressing certain specific types of data.

As I said earlier, the library contains many bells and whistles, including a complete set of file handling functions and other things that seem (to me) to be superfluous.  I hope that by providing a cut-to-the-bone example, I've made this tool a little easier for you to use.

References:

zlib Home page:
    http://www.zlib.net/index.html
zlib FAQ
    http://www.zlib.net/zlib_faq.html
zlib programming manual
    http://www.zlib.net/manual.html
gzip Home page
    http://www.gzip.org/index.html

If you want to build the zlib library (e.g., to prune out unwanted functionality), note that there are two VS projects, but you need to look for them:
    (unzip folder)\zlib-1.2.5\contrib\vstudio

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
If you liked this article and want to see more from this author, please click the Yes button near the:
      Was this article helpful?
label that is just below and to the right of this text.   Thanks!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
2
36,886 Views
DanRollins
CERTIFIED EXPERT

Comments (5)

I taked this sample code by guru, hoping that there will be errors¿ and detect memory leak.
just swap deflateEnd( &zInfo ) and deflateEnd( &zInfo ) to fix it. rtfm, bye.
CERTIFIED EXPERT
Author of the Year 2009

Author

Commented:
Thank you for your feedback!   I have modified the code in the CompressData() and UncompressData() functions so that the correct XxxxxEnd call is made.

-- Dan
Qlemo"Batchelor", Developer and EE Topic Advisor
CERTIFIED EXPERT
Top Expert 2015

Commented:
Interesting. Swapping  deflateEnd( &zInfo )   and   deflateEnd( &zInfo )  does not change anything :p
Qlermo inflateEnd and  deflateEnd
Qlemo"Batchelor", Developer and EE Topic Advisor
CERTIFIED EXPERT
Top Expert 2015

Commented:
I know. I just expect someone posting with "rtfm" to take more care of being precise and correct.

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.