Our community of experts have been thoroughly vetted for their expertise and industry experience. Experts with Gold status have received one of our highest-level Expert Awards, which recognize experts for their valuable contributions.
zlib is a free compression library (a DLL) on which the popular gzip utility is built. In this article, we'll see how to use the zlib functions to compress and decompress data in memory; that is, without needing to use a temporary file. We'll be coding in C++ in Visual Studio.
This seems like it would be a trivial problem, and indeed, it turns out to be relatively easy. The difficulty is in wading through the zlib documentation and figuring out what not to do! As with most GNU projects, the documentation is written in some sort of unix-geek-speak, and for some reason, they included lots of special options and rarely-needed functionality, but failed to provide a simple function call to handle this most basic of needs.
It would be way too easy for us if the zlib geniuses had provided the needed C++ header files with the download, so you also need to download the source for the entire library (if I may make an editorial comment here: Sheesh!). Get the ZIP file of the entire library source code here: http://zlib.net/zlib125.zip
We won't be building the library, but you will need to get the two files:
zlib.h
and
zconf.h
and put them into your project directory.
The Code
To make this example be as simple and universal as possible, we'll just throw together a console application program. Use all of the Visual Studio App Wizard defaults, and then copy the following into the main (only) CPP file:
#include "stdafx.h" // not actually needed#define ZLIB_WINAPI // actually actually needed (for linkage)#include "windows.h" // get BYTE et al.#include "zlib.h" // declare the external fns -- uses zconf.h, too#pragma comment(lib, "zlibwapi.lib") // for access to the DLLint GetMaxCompressedLen( int nLenSrc ) { int n16kBlocks = (nLenSrc+16383) / 16384; // round up any fraction of a block return ( nLenSrc + 6 + (n16kBlocks*5) );}int CompressData( const BYTE* abSrc, int nLenSrc, BYTE* abDst, int nLenDst ){ z_stream zInfo ={0}; zInfo.total_in= zInfo.avail_in= nLenSrc; zInfo.total_out= zInfo.avail_out= nLenDst; zInfo.next_in= (BYTE*)abSrc; zInfo.next_out= abDst; int nErr, nRet= -1; nErr= deflateInit( &zInfo, Z_DEFAULT_COMPRESSION ); // zlib function if ( nErr == Z_OK ) { nErr= deflate( &zInfo, Z_FINISH ); // zlib function if ( nErr == Z_STREAM_END ) { nRet= zInfo.total_out; } } deflateEnd( &zInfo ); // zlib function return( nRet );}int UncompressData( const BYTE* abSrc, int nLenSrc, BYTE* abDst, int nLenDst ){ z_stream zInfo ={0}; zInfo.total_in= zInfo.avail_in= nLenSrc; zInfo.total_out= zInfo.avail_out= nLenDst; zInfo.next_in= (BYTE*)abSrc; zInfo.next_out= abDst; int nErr, nRet= -1; nErr= inflateInit( &zInfo ); // zlib function if ( nErr == Z_OK ) { nErr= inflate( &zInfo, Z_FINISH ); // zlib function if ( nErr == Z_STREAM_END ) { nRet= zInfo.total_out; } } inflateEnd( &zInfo ); // zlib function return( nRet ); // -1 or len of output}
This defines three functions that simplify using the library for in-memory operation:
GetMaxCompressedLen -- Pass in the length of the source buffer and it returns the maximum size needed for the compressed output.
It should not be too shocking to learn that after running a lossless compression function, it is possible for the output to be larger than the input. Compression algorithms work by finding common repeated sequences. If there are no (or very few) repetitions, as in JPG files and encrypted files, then the algorithm can't find any air to squeeze out. There is a minor amount of overhead (6 bytes overall and 5 bytes per 16K block, as documented on the website), and this function takes that into consideration.
CompressData -- Pass in the address of the source (uncompressed) buffer and its length and provide an output buffer and its length. Use GetMaxCompressedLen when setting up these final two parameters.
UncompressData -- Pass in the address of the source buffer (the compressed data) and its length and provide an output buffer and its length. The output buffer needs to be large enough to hold the uncompressed output.
You should save the original data length (before compression) and use that in preparing the last two parameters. In this simplified example, I don't provide any means to estimate the output length (Note: The zlib algorithm can get compression as high as 1000-to-1 in certain extreme cases).
Now add the main() function in order to test the compression functions:
int main(){ BYTE pbSrc[]="hello hello hello hello there"; //-------------- compress (save the original length) int nLenOrig= strlen( (char*)pbSrc )+1; // include terminating NULL int nLenDst= GetMaxCompressedLen( nLenOrig ); BYTE* pbDst= new BYTE [nLenDst]; // alloc dest buffer int nLenPacked= CompressData( pbSrc, nLenOrig, pbDst, nLenDst ); if ( nLenPacked == -1 ) return(1); // error //-------------- uncompress (uses the saved original length) BYTE* pbPacked= pbDst; BYTE* pbUnpacked= new BYTE[ nLenOrig ]; int nLen= UncompressData( pbPacked, nLenPacked, pbUnpacked, nLenOrig ); // breakpoint here and view pbUnpacked to confirm delete pbDst; // do some cleanup delete pbUnpacked; return 0; }
Programmers like zlib because it is unencumbered by patents and there are no royalties to pay. The source code is freely available. You can use the DLL as shown here, or link into the static library version to avoid needing to distribute another executable file. The "deflate" (Huffman/LZ77) algorithm gets quite good compression, certainly adequate for most needs.
zlib is designed to use data streams that nearly always come from and go to disk files. That's what makes the documentation seem complicated. For in-memory packing and unpacking, all we needed to do was ignore all of the stream handling logic and set up the z_stream structure so that it would do everything in one go.
There is a lot more you can do with zlib, for instance, you can provide special handling to optimize the output, including selecting a compression strategy (Huffman-only or RLE-only) and you can even set up a compression dictionary in advance -- you might be able to out-perform the on-the-fly algorithms if you know you will be compressing certain specific types of data.
As I said earlier, the library contains many bells and whistles, including a complete set of file handling functions and other things that seem (to me) to be superfluous. I hope that by providing a cut-to-the-bone example, I've made this tool a little easier for you to use.
If you want to build the zlib library (e.g., to prune out unwanted functionality), note that there are two VS projects, but you need to look for them:
(unzip folder)\zlib-1.2.5\contrib\vstudio
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= If you liked this article and want to see more from this author, please click the Yes button near the:
Was this article helpful?
label that is just below and to the right of this text. Thanks!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Our community of experts have been thoroughly vetted for their expertise and industry experience. Experts with Gold status have received one of our highest-level Expert Awards, which recognize experts for their valuable contributions.
I taked this sample code by guru, hoping that there will be errors¿ and detect memory leak.
just swap deflateEnd( &zInfo ) and deflateEnd( &zInfo ) to fix it. rtfm, bye.
Our community of experts have been thoroughly vetted for their expertise and industry experience. Experts with Gold status have received one of our highest-level Expert Awards, which recognize experts for their valuable contributions.
Comments (5)
Commented:
just swap deflateEnd( &zInfo ) and deflateEnd( &zInfo ) to fix it. rtfm, bye.
Author
Commented:-- Dan
Commented:
Commented:
Commented: