Writing Huffman Code Bits to a file without padding

I'm currently writing a Huffman encoding program and I'm horribly stumped as to how to output the bits to the file without the extra bits at the end of the last byte.  Is there a way to truncate the last byte?? pseudocode and/or C++ code would be greatly appreciated.  Thanks in advance.
ashsaberAsked:
Who is Participating?
 
mtmikeCommented:
I think that you can compute the number of output bits in advance using the frequency table and bit lengths of the encoded characters.

Suppose the input consists of only the characters 'a', 'b' and 'c' and

a: frequency=40 encoding=1
b: frequency=10 encoding=01
c: frequency=12 encoding=00

Then the number of output bits is 40 * 1 + 10 * 2 + 12 * 2 = 84
0
 
_nn_Commented:
A file (for that matter, memory behaves the same) is a collection of bytes. That's the smallest unit. You can zero these last bits if you want, but I don't think there's a way to cut them off. At best, I could imagine that you could make some use of these bits by storing there the first bits of the next sequence. I'm not sure it would be so useful though.

What is the exact trouble with the spurious bits ? Please show us what you've done so far, that would certainly help.
0
 
mtmikeCommented:
You can't truncate the byte. I assume you need this for properly decoding an encoded file.

This can be accomplished by storing the number of valid bits in the last byte somewhere in the header of the encoded file. Storing a value of 1-8 only takes three bits.
0
Cloud Class® Course: CompTIA Cloud+

The CompTIA Cloud+ Basic training course will teach you about cloud concepts and models, data storage, networking, and network infrastructure.

 
bcladdCommented:
The suggestion of encoding the number of spare bits in the last byte makes sense and only costs you 3 bits as mtmike says. An alternative is to encode an end of file character and stop decoding the file when the EOF is decoded. Advantage of EOF is that it would permit multiple encoded files in one file (assuming they use the same encoding tree). Disadvantage is that it is EOF is a rare character and it is likely to cost more than 3 bits in the stream.

One disadvantage of encoding the number of bits in the last byte (or the number of slack bits, same difference) is you need to know that count earlier, probably in the file header. This means you either have to go back and do fixup in the header or encode the entire file in memory before writing it out. Not as clean as generating a small buffer of bits and writing whenever it is full. More memory management issues.

Just some thinking about it.

-bcl
0
 
bcladdCommented:
Yes, of course you can. My brain damage is acting up again.

-bcl
0
 
burcarpatCommented:
alternatively, you can use boost dynamic_bitset for storage and it has support for iostreams:

    http://www.boost.org/libs/dynamic_bitset/dynamic_bitset.html

[ about boost.org :: boost.org is an organization supported by many c++ standards committee members and provides 100% free, peer-reviewed, cross-platform libraries.  many of the boost libraries, such as their smart pointer library, are already in the drafts of the next revision of the c++ standard ]

-- ba
0
 
tinchosCommented:
No comment has been added lately, so it's time to clean up this TA.
I will leave the following recommendation for this question in the Cleanup topic area:

Split: mtmike {http:#9686458} & bcladd {http:#9686462} & burcarpat {http:#9691207}

Please leave any comments here within the next seven days.
PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER!

Tinchos
EE Cleanup Volunteer
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.