Solved

>>|<< Bit packing

Posted on 1998-09-18
25
534 Views
Last Modified: 2008-02-20
Hi,


    I am trying to pack 4 6 bits data into 3 bytes. How do I check whether the packing that I have done is correct? I
    don't want to write the codes for unpacking the data before checking that the codes for packing is correct. Following
    is my code segment :

    int packbit (unsigned char *a, int n, unsigned char *b)
    {
      int i;
      int loop=n/4;
       
    for(i=0; i<loop;a+=4,b+=3)
       {
          b[0]=a[0]+(unsigned char)(a[1]<<6);
          b[1]=(a[1]>>2)+(unsigned char)(a[2]<<4);
          b[2]=(a[2]>>4)+(unsigned char)(a[3]<<2);
       }
    }

    Pls help with some code examples. Thanks.

    Rgds,
    PY
0
Comment
Question by:peiyoke
  • 10
  • 9
  • 2
  • +3
25 Comments
 
LVL 2

Expert Comment

by:gysbert1
ID: 1173062
what do you pass as n ?
0
 
LVL 22

Expert Comment

by:nietod
ID: 1173063
One problem is that you do not increment "i" in your loop.

In addition I'm not sure, but I think you may be organizing the bits wrong.  How do you want them to end up?
0
 
LVL 2

Expert Comment

by:gysbert1
ID: 1173064
I cannot see why you need the loop at all, that is why I want to know what you pass in n and what it should do. It seems that you want to pack an array of 6 bit numbers into an array of bytes.


0
 
LVL 22

Expert Comment

by:nietod
ID: 1173065
n is the number of bytes to be packed.  This is used to pack an array bytes into shorter array.  The length of the source array must be a multiple of 4 bytes and the destination will be exactly 3/4 as long.
0
 
LVL 2

Expert Comment

by:gysbert1
ID: 1173066
I checked the code for you ...

Looks just fine to me ! (that is if you add a i++ to the loop increment part)

(I entered the code and ran it over a few iterations with known data and got the correct output)
0
 
LVL 22

Expert Comment

by:nietod
ID: 1173067
Wait a second.  Tthere are at least two common algorithms for bit packing.  We can't say it is correct for sure, unless we know what the desired packing is.  With the loop fixed, it doesn't crash or loop for ever, but it might not be packing in the manner you want.
0
 
LVL 2

Expert Comment

by:gysbert1
ID: 1173068
Your code would pack
a    3                  2              1                0
  00000001   00000011   00000111   00000001
      MSB                                           LSB

four eight bit bytes each containing only 6 bits of data
into
b       2               1               0
  00000100   00110001   11000001
      MSB                           LSB


I assumed you are talking about a bitstream with extra 0's in that you want to remove to either save space or convert to a compact stream looking most the same ...

It is very true what nietod said, it may even save you some trouble later on if this was correct so make sure in any case. Thanx for pointing that out !
0
 

Expert Comment

by:Forge
ID: 1173069
int packbit (unsigned char *a, int n, unsigned char *b)  {
// char a[n/4];  char b[n/3]
  for(;  n>=0; n-=4, a+=4,b+=3)
           {
              b[0]=a[0]+(unsigned char)(a[1]<<6);            // a[0]<5..0> + (a[1]<1..0> <<6) OK
              b[1]=(a[1]>>2)+(unsigned char)(a[2]<<4);    // a[1]<5..2> + (a[2]<3..0> <<4) OK
              b[2]=(a[2]>>4)+(unsigned char)(a[3]<<2);    // a[2]<5..4> + (a[3]<5..0> <<2) OK
           }
        }

Seems to work just fine. Why dont you just make a decoding algorithm by inverting this one, and then pack the ****, and depack it again and check agains original input?
The inverse algorithm is:

int depackbit (unsigned char *a, int n, unsigned char *b)  {

// char a[n/3];  char b[n/4]
  for(;  n>=0; n-=4, a+=3,b+=4)
           {          
              b[0]=a[0 ] & (0x3f);
              b[1]=(a[0] >>6) | (a[1]<<2);
              b[2]=..
              b[3]=..
           }
        }
0
 
LVL 22

Expert Comment

by:nietod
ID: 1173070
The current code starts with 4 bytes whose bits are indicated by the following letters (0 bits are lost)

00abcdef 00ghijkl 00mnopqr 00stuvwx

they are packed to 3 bytes

klabcdef opqrghij  stuvwxmn


0
 

Author Comment

by:peiyoke
ID: 1173071

Experts, first of all thanks for all the comments. There was a typo error that I didn't increment the i.

I did try to run the code but the output is incorrect.

gysbert1, how did u manage to get the correct output? I wrote another code segment to unpack the data and the output is incorrect.

What I am trying to do is just to save space. As long as the data being unpacked is the same as the original data, it doesn't matter whether what packing algorithm being used.

So after all the comments, how do I do the checking whether my code is correct?
0
 
LVL 22

Expert Comment

by:nietod
ID: 1173072
Post the code you used for unpacking.
0
 
LVL 2

Expert Comment

by:gysbert1
ID: 1173073
nietod:

if you have
    a[0]=00abcdef a[1]=00ghijkl a[2]= 00mnopqr a[3]=00stuvwx
       they are packed to 3 bytes
    b[0]=klabcdef  b[1]=opqrghij  b[2]=stuvwxmn

as you showed but if  :
 
    a[3]=00abcdef a[2]=00ghijkl a[1]= 00mnopqr a[0]=00stuvwx
       they are packed to 3 bytes
    b[2]=abcdefgh  b[1]=ijklmnop  b[0]=qrstuvwx

That is exactly the difference between a little endian and big endian machines.

From http://webopedia.internet.com/TERM/b/big_endian.html :

(Big-Endian)
Refers to which bytes are most significant in
multi-byte data types. In big-endian architectures, the
leftmost bytes (those with a lower address) are most
significant. In little-endian architectures, the rightmost
bytes are most significant. For example, consider the
number 1025 (2 to the tenth power plus one) stored
in a 4-byte integer

Go have a look for a good graphical explanation.

Since I have a little endian (Intel Pentium) machine I converted the 3 bytes (padded with a 0 byte at the msb) to an integer which gave me a number with b[0] = lsb and b[4] (the 0 padded byte) the most significant.

I then typed this integer into the windows scientific calculator and converted it to a binary number to check the bit sequence ... Easy as that.

I also suggest you post the code for unpacking and maybe give some more detail as to what this is for so that we can make sure you are getting the results you need. If what I showed above is what you intended (as I assumed) there is definately something wrong in your unpacking code.
0
Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

 
LVL 22

Expert Comment

by:nietod
ID: 1173074
gyspert, the big endian/little endian issue doesn't matter when the data is being accessed by bytes.  It would only matter if you read in a 32 bit source value and then packed it.  His algorithm will yeild identical results on all machines.
0
 
LVL 2

Expert Comment

by:gysbert1
ID: 1173075
I only used the Little/Big endian issue as an example to show why the two result sets differ according to the way you look at it. NIETOD made an error in his example because he chose to write the LSBit on the right hand side and the LSByte on the left. It is a classic Little/Big endian mistake to make.

We are all used to writing numbers with the most significant part on the left hand side (as you showed in your example of why the code seems wrong). That is part of the debate on Little/Big endian. Both type of implemetations exist in hardware and work fine. It is however more logical to write the leas significant BIT on the right hand side since that is the way we are used to writing numbers.
With
a[0]=00abcdef a[1]=00ghijkl a[2]= 00mnopqr a[3]=00stuvwx
You have the least significant byte (a[0]) on the left but the least significant bit of a[0] (f) on the right hand side. You are thus utterly confusing everyone by suggesting that they should fit giving  abcdefghijklmnopqrstuvx if viewed as a bitstream.

Remember that a[0] arrived first. If we want to construct a stream we drop the most significant two bits to change the 8 bit stream into a 6bit stream. It would arrive:
 a and then b and then c ... LSBit first and LSByte first (consistency, this is a binary stream). We would get   a  then  ba then  cba  then  dcba   then edcba then fedcba and then two zeros leaving  00fedcba as the LSByte. Next one would give 00lkjihg etc.

If you now write all the bytes (LSB on the Right) you get  ...   00lkjihg  00fedcba

This packs perfectly to  ...lkjihgfedcba with his code if once again we write the LSB on the right hand side. (I ommited the first two bytes for  sake of time and space)

It is an exact analogy to the Big/Little endian debate and I cannot see why nietod has a problem with it ? Although we are not packing to 32bit numbers, if we mix and match the endianness in the bits and bytes we will get something like

klabcdef opqrghij  stuvwxmn

as NIETOD was so kind as to show us. His mistake is a classic example of the NUXI problem described at the URL I sited, although a bit more complex than just NUXI.
0
 
LVL 22

Expert Comment

by:nietod
ID: 1173076
gysbert, you missunderstand me  From the very start I said

"There are at least two common algorithms for bit packing."

I supposed you could say they correspond to the little/big endian schemes.  I don't really agree with that 100% as both algorityms work under either scheme, it really just depends on what you want/need.

I did not say that the bits were being packed in the wrong order.  I was just trying to graphically show him where they were being placed.  (I felt your example was insufficient because the bits were only 1's or 0's and therefore indistinguishable.)

In any case, the algorithm used doesn't matter, we just need an unpack that matches the pack.
0
 
LVL 2

Expert Comment

by:gysbert1
ID: 1173077
Sorry nietod, I thought that your example was meant to show that the packing was wrong since "klabcdef opqrghij  stuvwxmn" did look completely wrong. It might have helped if you included both common ways ...

For unpacking something like :

a[0] = b[0] & 0x3f;
a[1] = ((b[1] << 2) | ( b[0] >> 6)) & 0x3f;
a[2] = ((b[1] >> 4) | (b[2] << 4)) & 0x3f;
a[3] = b[2] >> 2;

in the loop should work. I did not check this completely though !
0
 
LVL 22

Expert Comment

by:nietod
ID: 1173078
>> It might have helped if you included both common ways

I probably couldn't.  I've written tons of big-endian code.  But several million lines (literally) of little endian assembly code later...well it hurts to think about it.

Your unpack code looks right.
0
 

Accepted Solution

by:
gbeau earned 10 total points
ID: 1173079
You could probably make your code simpler by first stuffing the 24 bits into a long, then extracting the lower three bytes of the long:

long result = 0;

// Make sure you really do have just 6 bits in each byte
assert( a[0] & 0xC0 );
assert( a[1] & 0xC0 );
assert( a[2] & 0xC0 );

// Move each 6 bit thing to whatever position you want
result = result | (long(a0) << 12);
result = result | (long(a1) << 6);
result = result | (long(a2) << 0);

// Extract the lower three bytes
b[2] = (result >> 16) & 0xFF;
b[1] = (result >> 8)  & 0xFF;
b[0] = (result >> 0)  & 0xFF;



0
 
LVL 2

Expert Comment

by:gysbert1
ID: 1173080
The code you posted would of course work on any machine while the code gbeau gave only works on little endian machines  ;^)
0
 
LVL 84

Expert Comment

by:ozo
ID: 1173081
the code gbeau gave works equally well on big endian machines
0
 
LVL 2

Expert Comment

by:gysbert1
ID: 1173082
Sorry, my mistake. I did not have a good look. You are indeed correct ozo...
0
 
LVL 22

Expert Comment

by:nietod
ID: 1173083
Yeah, it will work on any machine.  The endian scheme only matters when reading and writting multi-byte values.   However it does assume that "long" is at least 32 bits.  I'm not sure if that is gauranteed or not.
0
 
LVL 84

Expert Comment

by:ozo
ID: 1173084
LONG_MAX must be at least +2147483647
0
 

Expert Comment

by:gbeau
ID: 1173085
All the longs I've seen are 4 bytes, but if you're worried about it, just add this to check:
   assert( sizeof(long) >= 4 );
In VC++, you could also use ULONG, LONG, or DWORD for "result" instead of long.  I'd probably use DWORD for "result", and BYTE for the arrays of input and output bytes.

BTW, I'm not sure whether the ordering of the bytes in my code ends up the same as in peiyoke's original code, but it's easy enough to change - just reverse the order of the last three lines if need be.

BTW2, I didn't actually answer peiyoke's question: "How do I check whether the packing that I have done is correct?"  For that, I would suggest feeding in some test data, and printing out the output bytes (probably in hex).  Or just check each step with a debugger.

BTW3, the compiler may complain about converting longs to chars in the last three lines.  If so, just add a cast:
   b[2] = (unsigned char)((result >> 16) & 0xFF);

BTW4, the shifts by 0 are clearly unnecessary.  I just put them in to make the code look clearer and more consistent.  The compiler will probably eliminate them.

Hope this answers all the concerns...

0
 
LVL 22

Expert Comment

by:nietod
ID: 1173086
>> All the longs I've seen are 4 bytes, but if you're worried about it, just add this to check:
According to ozo, you are guaranteed that on all C++ implimentations that "long" will be able to hold at least +2147483647.  By some weird coincidence it just happens that this strange number is the highest value of a signed 4 byte integer.  Thus no check is needed.
0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

What is C++ STL?: STL stands for Standard Template Library and is a part of standard C++ libraries. It contains many useful data structures (containers) and algorithms, which can spare you a lot of the time. Today we will look at the STL Vector. …
C++ Properties One feature missing from standard C++ that you will find in many other Object Oriented Programming languages is something called a Property (http://www.experts-exchange.com/Programming/Languages/CPP/A_3912-Object-Properties-in-C.ht…
The viewer will learn how to use the return statement in functions in C++. The video will also teach the user how to pass data to a function and have the function return data back for further processing.
The viewer will learn how to user default arguments when defining functions. This method of defining functions will be contrasted with the non-default-argument of defining functions.

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now