On the fly compression/decompression

Posted on 2004-08-06
Last Modified: 2010-04-15
Hi all, been a long time since I need to come here.

I'm looking for a solution to a space problem that I am having. I run a mud, (multi-user-dungeon) and it creates a log for every action of all the players. This takes up alot of space, and I've had to start removing things from teh log to save space, that is a bad solution.

I would do routine compressions of the data if I could, however the mud later on needs the uncompressed data, as we recall the text via a webserver implemented in the mud.

I had the idea of compressing the data using either zlib, or another compression technique, however I am not very well versed in such things.

I would need to compress when writing, and uncompress when reading, without a noticeable speed decrease

My question to you experts: What compression solution should I use (zlib/other?) and is it possible that someone could write me up a quick snippet or code to show me how I would go about compressing/decompressing text on the fly.

Thank you!
Question by:Krule
LVL 55

Expert Comment

by:Jaime Olivares
ID: 11737811
Here is a simple compression library, just compress and uncompress a block of memory into another buffer.


WORD UncompressBlock(BYTE *Source, BYTE *Dest, WORD SourceSize)
      signed short Hash[4096];
      WORD Key,Size,Pos;
      BYTE Bit = 0;
      WORD Command;
      WORD X = 0;
      WORD Y = 3;
      WORD Z = 1;

      for (Key = 0; Key < 4096; Key++)
            Hash[Key] = -1;
      Dest[0] = FLAG_Compress;

      while (X<SourceSize && Y<=SourceSize) {
            /*printf("X = %i Y = %i Z = %i",X,Y,Z);*/
            if (Bit > 15) {
                  Dest[Z++] = (Command >> 8) & 0x00ff;
                  Dest[Z] = Command & 0x00ff;
                  Z = Y;
                  Bit = 0;
                  Y += 2;
            for (Size = 1; Source[X]==Source[X+Size] && Size<0x0fff && (X+Size) < SourceSize; Size++);
            if (Size >= 16) {
                  Dest[Y++] = 0;
                  Dest[Y++] = ((Size - 16) >> 8) & 0x00ff;
                  Dest[Y++] = (Size - 16) & 0x00ff;
                  Dest[Y++] = Source[X];
                  X += Size;
                  Command = (Command << 1) + 1;
            } else if (GetMatch(Source,X,SourceSize,Hash,Size,Pos)) {
                  Key = ((X-Pos) << 4) + (Size - 3);
                  Dest[Y++] = (Key >> 8) & 0x00ff;
                  Dest[Y++] = Key & 0x00ff;
                  X += Size;
                  Command = (Command << 1) + 1;
            } else {
                  Dest[Y++] = Source[X++];
                  Command = (Command << 1);
      Command <<= (16-Bit);
      Dest[Z++] = (Command >> 8) & 0x00ff;
      Dest[Z] = Command & 0x00ff;

      if (Y>SourceSize) {
            for (Y=0; Y<SourceSize; Dest[Y+1]=Source[Y++]);
            Dest[0] = FLAG_Copied;
            return SourceSize+1;
      return Y;


WORD CompressBlock(BYTE *Source, BYTE *Dest, WORD SourceSize)
      WORD Pos, Size, K;
      WORD Command = (Source[1] << 8) + Source[2];
      WORD X = 3;
      WORD Y = 0;
      BYTE Bit = 16;

      if (Source[0] == FLAG_Copied) {
            for (Y=1; Y<SourceSize; Dest[Y-1]=Source[Y++]);
            return SourceSize-1;
      while (X < SourceSize) {
            if (Bit == 0) {
                  Command = (Source[X++] << 8);
                  Command += Source[X++];
                  Bit = 16;
            if (Command & 0x8000) {
                  Pos = (Source[X++] << 4);
                  Pos += (Source[X] >> 4);
                  if (Pos) {
                        Size = (Source[X++] & 0x0f)+3;
                        for (K=0; K<Size; K++)
                              Dest[Y+K] = Dest[Y-Pos+K];
                        Y += Size;
                  } else {
                        Size = (Source[X++] << 8);
                        Size += Source[X++] + 16;
                        for (K=0; K<Size; Dest[Y+K++] = Source[X]);
                        Y += Size;
            } else
                  Dest[Y++] = Source[X++];
            Command <<= 1;
      return Y;

BOOL GetMatch(BYTE *Source, WORD X, WORD SourceSize, signed short *Hash, WORD &Size, WORD &Pos)
      WORD HashValue = (40543L*((((Source[X] << 4) ^ Source[X+1]) << 4) ^ Source[X+2]) >> 4) & 0xfff;
      signed short p = Hash[HashValue];

      Hash[HashValue] = X;
      if (p!=-1 && (X-p)<4096) {
            Pos = p;
            for (Size=0; (Size<18) && (Source[X+Size]==Source[p+Size]) && ((X+Size)<SourceSize); Size++);
            return (Size >= 3);
      return false;

LVL 55

Expert Comment

by:Jaime Olivares
ID: 11737847
Notice about the data type

WORD -> A 16 bits integer
BYTE   -> A 8 bits integer
BOOL  -> An integer, acting as boolean

To compress, call:
CompressBlock(PtrToYourData, PtrToYourCompressedBuffer, OriginalDataSize);
Of course, buffer must be allocated first, with the same size as original data
Also you must store original data size to reserve buffer for uncompress function

To uncompress call:
UncompressBlock(PtrToYourCompressedBuffer, PtrToUncompressedData, CompressedDataSize);
You must allocate uncompressed buffer with original data size


Author Comment

ID: 11737949
Ok...I might have forgotten to mention a step.

I need to compress..write to from disk...uncompress. The problem there is keeping track of the uncompressed datas size when I write to disk.. :\
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

LVL 55

Expert Comment

by:Jaime Olivares
ID: 11738010
So, what's the problem?

You have some data to store.
Call compress routine to compress it in a buffer.
Open a file
Write uncompressed data size at the very first.
Write compressed block
Close the file

To uncompress:
Open the file
Read uncompressed data size at the very first
Invoke your uncompress routine
Close file

That's all.

Author Comment

ID: 11738297
Oh, that would work ok.

Is this like the best compression I can get, anyone else have suggestions?

Accepted Solution

JugglerW earned 200 total points
ID: 11744135
Look here:

exists for many platforms.
LVL 16

Assisted Solution

PaulCaswell earned 200 total points
ID: 11750634
One I've used, probably not the best, is LZW. It was designed for on-the-fly serial communications compression and is very fast at both compress and uncompress time. It runs with a type of state table so you can even post it data one byte at a time. If your log is purely ASCII, you will probably get some good ratios.

A google search should find it as lzw or Lempel Ziv Welch. Here's a good article:


Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

An Outlet in Cocoa is a persistent reference to a GUI control; it connects a property (a variable) to a control.  For example, it is common to create an Outlet for the text field GUI control and change the text that appears in this field via that Ou…
This is a short and sweet, but (hopefully) to the point article. There seems to be some fundamental misunderstanding about the function prototype for the "main" function in C and C++, more specifically what type this function should return. I see so…
The goal of this video is to provide viewers with basic examples to understand and use pointers in the C programming language.
The goal of this video is to provide viewers with basic examples to understand and use switch statements in the C programming language.

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now