Writing Huffman Code Bits to a file without padding

I'm currently writing a Huffman encoding program and I'm horribly stumped as to how to output the bits to the file without the extra bits at the end of the last byte.  Is there a way to truncate the last byte?? pseudocode and/or C++ code would be greatly appreciated.  Thanks in advance.
ashsaberAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

_nn_Commented:
A file (for that matter, memory behaves the same) is a collection of bytes. That's the smallest unit. You can zero these last bits if you want, but I don't think there's a way to cut them off. At best, I could imagine that you could make some use of these bits by storing there the first bits of the next sequence. I'm not sure it would be so useful though.

What is the exact trouble with the spurious bits ? Please show us what you've done so far, that would certainly help.
mtmikeCommented:
You can't truncate the byte. I assume you need this for properly decoding an encoded file.

This can be accomplished by storing the number of valid bits in the last byte somewhere in the header of the encoded file. Storing a value of 1-8 only takes three bits.
bcladdCommented:
The suggestion of encoding the number of spare bits in the last byte makes sense and only costs you 3 bits as mtmike says. An alternative is to encode an end of file character and stop decoding the file when the EOF is decoded. Advantage of EOF is that it would permit multiple encoded files in one file (assuming they use the same encoding tree). Disadvantage is that it is EOF is a rare character and it is likely to cost more than 3 bits in the stream.

One disadvantage of encoding the number of bits in the last byte (or the number of slack bits, same difference) is you need to know that count earlier, probably in the file header. This means you either have to go back and do fixup in the header or encode the entire file in memory before writing it out. Not as clean as generating a small buffer of bits and writing whenever it is full. More memory management issues.

Just some thinking about it.

-bcl
Exploring SharePoint 2016

Explore SharePoint 2016, the web-based, collaborative platform that integrates with Microsoft Office to provide intranets, secure document management, and collaboration so you can develop your online and offline capabilities.

mtmikeCommented:
I think that you can compute the number of output bits in advance using the frequency table and bit lengths of the encoded characters.

Suppose the input consists of only the characters 'a', 'b' and 'c' and

a: frequency=40 encoding=1
b: frequency=10 encoding=01
c: frequency=12 encoding=00

Then the number of output bits is 40 * 1 + 10 * 2 + 12 * 2 = 84

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
bcladdCommented:
Yes, of course you can. My brain damage is acting up again.

-bcl
burcarpatCommented:
alternatively, you can use boost dynamic_bitset for storage and it has support for iostreams:

    http://www.boost.org/libs/dynamic_bitset/dynamic_bitset.html

[ about boost.org :: boost.org is an organization supported by many c++ standards committee members and provides 100% free, peer-reviewed, cross-platform libraries.  many of the boost libraries, such as their smart pointer library, are already in the drafts of the next revision of the c++ standard ]

-- ba
tinchosCommented:
No comment has been added lately, so it's time to clean up this TA.
I will leave the following recommendation for this question in the Cleanup topic area:

Split: mtmike {http:#9686458} & bcladd {http:#9686462} & burcarpat {http:#9691207}

Please leave any comments here within the next seven days.
PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER!

Tinchos
EE Cleanup Volunteer
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
C++

From novice to tech pro — start learning today.