# >>|<< Bit packing

Hi,

I am trying to pack 4 6 bits data into 3 bytes. How do I check whether the packing that I have done is correct? I
don't want to write the codes for unpacking the data before checking that the codes for packing is correct. Following
is my code segment :

int packbit (unsigned char *a, int n, unsigned char *b)
{
int i;
int loop=n/4;

for(i=0; i<loop;a+=4,b+=3)
{
b[0]=a[0]+(unsigned char)(a[1]<<6);
b[1]=(a[1]>>2)+(unsigned char)(a[2]<<4);
b[2]=(a[2]>>4)+(unsigned char)(a[3]<<2);
}
}

Pls help with some code examples. Thanks.

Rgds,
PY
###### Who is Participating?

Commented:
You could probably make your code simpler by first stuffing the 24 bits into a long, then extracting the lower three bytes of the long:

long result = 0;

// Make sure you really do have just 6 bits in each byte
assert( a[0] & 0xC0 );
assert( a[1] & 0xC0 );
assert( a[2] & 0xC0 );

// Move each 6 bit thing to whatever position you want
result = result | (long(a0) << 12);
result = result | (long(a1) << 6);
result = result | (long(a2) << 0);

// Extract the lower three bytes
b[2] = (result >> 16) & 0xFF;
b[1] = (result >> 8)  & 0xFF;
b[0] = (result >> 0)  & 0xFF;

0

Commented:
what do you pass as n ?
0

Commented:
One problem is that you do not increment "i" in your loop.

In addition I'm not sure, but I think you may be organizing the bits wrong.  How do you want them to end up?
0

Commented:
I cannot see why you need the loop at all, that is why I want to know what you pass in n and what it should do. It seems that you want to pack an array of 6 bit numbers into an array of bytes.

0

Commented:
n is the number of bytes to be packed.  This is used to pack an array bytes into shorter array.  The length of the source array must be a multiple of 4 bytes and the destination will be exactly 3/4 as long.
0

Commented:
I checked the code for you ...

Looks just fine to me ! (that is if you add a i++ to the loop increment part)

(I entered the code and ran it over a few iterations with known data and got the correct output)
0

Commented:
Wait a second.  Tthere are at least two common algorithms for bit packing.  We can't say it is correct for sure, unless we know what the desired packing is.  With the loop fixed, it doesn't crash or loop for ever, but it might not be packing in the manner you want.
0

Commented:
a    3                  2              1                0
00000001   00000011   00000111   00000001
MSB                                           LSB

four eight bit bytes each containing only 6 bits of data
into
b       2               1               0
00000100   00110001   11000001
MSB                           LSB

I assumed you are talking about a bitstream with extra 0's in that you want to remove to either save space or convert to a compact stream looking most the same ...

It is very true what nietod said, it may even save you some trouble later on if this was correct so make sure in any case. Thanx for pointing that out !
0

Commented:
int packbit (unsigned char *a, int n, unsigned char *b)  {
// char a[n/4];  char b[n/3]
for(;  n>=0; n-=4, a+=4,b+=3)
{
b[0]=a[0]+(unsigned char)(a[1]<<6);            // a[0]<5..0> + (a[1]<1..0> <<6) OK
b[1]=(a[1]>>2)+(unsigned char)(a[2]<<4);    // a[1]<5..2> + (a[2]<3..0> <<4) OK
b[2]=(a[2]>>4)+(unsigned char)(a[3]<<2);    // a[2]<5..4> + (a[3]<5..0> <<2) OK
}
}

Seems to work just fine. Why dont you just make a decoding algorithm by inverting this one, and then pack the ****, and depack it again and check agains original input?
The inverse algorithm is:

int depackbit (unsigned char *a, int n, unsigned char *b)  {

// char a[n/3];  char b[n/4]
for(;  n>=0; n-=4, a+=3,b+=4)
{
b[0]=a[0 ] & (0x3f);
b[1]=(a[0] >>6) | (a[1]<<2);
b[2]=..
b[3]=..
}
}
0

Commented:
The current code starts with 4 bytes whose bits are indicated by the following letters (0 bits are lost)

00abcdef 00ghijkl 00mnopqr 00stuvwx

they are packed to 3 bytes

klabcdef opqrghij  stuvwxmn

0

Author Commented:

Experts, first of all thanks for all the comments. There was a typo error that I didn't increment the i.

I did try to run the code but the output is incorrect.

gysbert1, how did u manage to get the correct output? I wrote another code segment to unpack the data and the output is incorrect.

What I am trying to do is just to save space. As long as the data being unpacked is the same as the original data, it doesn't matter whether what packing algorithm being used.

So after all the comments, how do I do the checking whether my code is correct?
0

Commented:
Post the code you used for unpacking.
0

Commented:
nietod:

if you have
a[0]=00abcdef a[1]=00ghijkl a[2]= 00mnopqr a[3]=00stuvwx
they are packed to 3 bytes
b[0]=klabcdef  b[1]=opqrghij  b[2]=stuvwxmn

as you showed but if  :

a[3]=00abcdef a[2]=00ghijkl a[1]= 00mnopqr a[0]=00stuvwx
they are packed to 3 bytes
b[2]=abcdefgh  b[1]=ijklmnop  b[0]=qrstuvwx

That is exactly the difference between a little endian and big endian machines.

From http://webopedia.internet.com/TERM/b/big_endian.html :

(Big-Endian)
Refers to which bytes are most significant in
multi-byte data types. In big-endian architectures, the
leftmost bytes (those with a lower address) are most
significant. In little-endian architectures, the rightmost
bytes are most significant. For example, consider the
number 1025 (2 to the tenth power plus one) stored
in a 4-byte integer

Go have a look for a good graphical explanation.

Since I have a little endian (Intel Pentium) machine I converted the 3 bytes (padded with a 0 byte at the msb) to an integer which gave me a number with b[0] = lsb and b[4] (the 0 padded byte) the most significant.

I then typed this integer into the windows scientific calculator and converted it to a binary number to check the bit sequence ... Easy as that.

I also suggest you post the code for unpacking and maybe give some more detail as to what this is for so that we can make sure you are getting the results you need. If what I showed above is what you intended (as I assumed) there is definately something wrong in your unpacking code.
0

Commented:
gyspert, the big endian/little endian issue doesn't matter when the data is being accessed by bytes.  It would only matter if you read in a 32 bit source value and then packed it.  His algorithm will yeild identical results on all machines.
0

Commented:
I only used the Little/Big endian issue as an example to show why the two result sets differ according to the way you look at it. NIETOD made an error in his example because he chose to write the LSBit on the right hand side and the LSByte on the left. It is a classic Little/Big endian mistake to make.

We are all used to writing numbers with the most significant part on the left hand side (as you showed in your example of why the code seems wrong). That is part of the debate on Little/Big endian. Both type of implemetations exist in hardware and work fine. It is however more logical to write the leas significant BIT on the right hand side since that is the way we are used to writing numbers.
With
a[0]=00abcdef a[1]=00ghijkl a[2]= 00mnopqr a[3]=00stuvwx
You have the least significant byte (a[0]) on the left but the least significant bit of a[0] (f) on the right hand side. You are thus utterly confusing everyone by suggesting that they should fit giving  abcdefghijklmnopqrstuvx if viewed as a bitstream.

Remember that a[0] arrived first. If we want to construct a stream we drop the most significant two bits to change the 8 bit stream into a 6bit stream. It would arrive:
a and then b and then c ... LSBit first and LSByte first (consistency, this is a binary stream). We would get   a  then  ba then  cba  then  dcba   then edcba then fedcba and then two zeros leaving  00fedcba as the LSByte. Next one would give 00lkjihg etc.

If you now write all the bytes (LSB on the Right) you get  ...   00lkjihg  00fedcba

This packs perfectly to  ...lkjihgfedcba with his code if once again we write the LSB on the right hand side. (I ommited the first two bytes for  sake of time and space)

It is an exact analogy to the Big/Little endian debate and I cannot see why nietod has a problem with it ? Although we are not packing to 32bit numbers, if we mix and match the endianness in the bits and bytes we will get something like

klabcdef opqrghij  stuvwxmn

as NIETOD was so kind as to show us. His mistake is a classic example of the NUXI problem described at the URL I sited, although a bit more complex than just NUXI.
0

Commented:
gysbert, you missunderstand me  From the very start I said

"There are at least two common algorithms for bit packing."

I supposed you could say they correspond to the little/big endian schemes.  I don't really agree with that 100% as both algorityms work under either scheme, it really just depends on what you want/need.

I did not say that the bits were being packed in the wrong order.  I was just trying to graphically show him where they were being placed.  (I felt your example was insufficient because the bits were only 1's or 0's and therefore indistinguishable.)

In any case, the algorithm used doesn't matter, we just need an unpack that matches the pack.
0

Commented:
Sorry nietod, I thought that your example was meant to show that the packing was wrong since "klabcdef opqrghij  stuvwxmn" did look completely wrong. It might have helped if you included both common ways ...

For unpacking something like :

a[0] = b[0] & 0x3f;
a[1] = ((b[1] << 2) | ( b[0] >> 6)) & 0x3f;
a[2] = ((b[1] >> 4) | (b[2] << 4)) & 0x3f;
a[3] = b[2] >> 2;

in the loop should work. I did not check this completely though !
0

Commented:
>> It might have helped if you included both common ways

I probably couldn't.  I've written tons of big-endian code.  But several million lines (literally) of little endian assembly code later...well it hurts to think about it.

0

Commented:
The code you posted would of course work on any machine while the code gbeau gave only works on little endian machines  ;^)
0

Commented:
the code gbeau gave works equally well on big endian machines
0

Commented:
Sorry, my mistake. I did not have a good look. You are indeed correct ozo...
0

Commented:
Yeah, it will work on any machine.  The endian scheme only matters when reading and writting multi-byte values.   However it does assume that "long" is at least 32 bits.  I'm not sure if that is gauranteed or not.
0

Commented:
LONG_MAX must be at least +2147483647
0

Commented:
All the longs I've seen are 4 bytes, but if you're worried about it, just add this to check:
assert( sizeof(long) >= 4 );
In VC++, you could also use ULONG, LONG, or DWORD for "result" instead of long.  I'd probably use DWORD for "result", and BYTE for the arrays of input and output bytes.

BTW, I'm not sure whether the ordering of the bytes in my code ends up the same as in peiyoke's original code, but it's easy enough to change - just reverse the order of the last three lines if need be.

BTW2, I didn't actually answer peiyoke's question: "How do I check whether the packing that I have done is correct?"  For that, I would suggest feeding in some test data, and printing out the output bytes (probably in hex).  Or just check each step with a debugger.

BTW3, the compiler may complain about converting longs to chars in the last three lines.  If so, just add a cast:
b[2] = (unsigned char)((result >> 16) & 0xFF);

BTW4, the shifts by 0 are clearly unnecessary.  I just put them in to make the code look clearer and more consistent.  The compiler will probably eliminate them.

Hope this answers all the concerns...

0

Commented:
>> All the longs I've seen are 4 bytes, but if you're worried about it, just add this to check:
According to ozo, you are guaranteed that on all C++ implimentations that "long" will be able to hold at least +2147483647.  By some weird coincidence it just happens that this strange number is the highest value of a signed 4 byte integer.  Thus no check is needed.
0