Solved

Writing Huffman Code Bits to a file without padding

Posted on 2003-11-05
8
1,381 Views
Last Modified: 2007-12-19
I'm currently writing a Huffman encoding program and I'm horribly stumped as to how to output the bits to the file without the extra bits at the end of the last byte.  Is there a way to truncate the last byte?? pseudocode and/or C++ code would be greatly appreciated.  Thanks in advance.
0
Comment
Question by:ashsaber
8 Comments
 
LVL 16

Expert Comment

by:_nn_
ID: 9685758
A file (for that matter, memory behaves the same) is a collection of bytes. That's the smallest unit. You can zero these last bits if you want, but I don't think there's a way to cut them off. At best, I could imagine that you could make some use of these bits by storing there the first bits of the next sequence. I'm not sure it would be so useful though.

What is the exact trouble with the spurious bits ? Please show us what you've done so far, that would certainly help.
0
 
LVL 5

Expert Comment

by:mtmike
ID: 9686111
You can't truncate the byte. I assume you need this for properly decoding an encoded file.

This can be accomplished by storing the number of valid bits in the last byte somewhere in the header of the encoded file. Storing a value of 1-8 only takes three bits.
0
 
LVL 11

Expert Comment

by:bcladd
ID: 9686233
The suggestion of encoding the number of spare bits in the last byte makes sense and only costs you 3 bits as mtmike says. An alternative is to encode an end of file character and stop decoding the file when the EOF is decoded. Advantage of EOF is that it would permit multiple encoded files in one file (assuming they use the same encoding tree). Disadvantage is that it is EOF is a rare character and it is likely to cost more than 3 bits in the stream.

One disadvantage of encoding the number of bits in the last byte (or the number of slack bits, same difference) is you need to know that count earlier, probably in the file header. This means you either have to go back and do fixup in the header or encode the entire file in memory before writing it out. Not as clean as generating a small buffer of bits and writing whenever it is full. More memory management issues.

Just some thinking about it.

-bcl
0
What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 5

Accepted Solution

by:
mtmike earned 168 total points
ID: 9686458
I think that you can compute the number of output bits in advance using the frequency table and bit lengths of the encoded characters.

Suppose the input consists of only the characters 'a', 'b' and 'c' and

a: frequency=40 encoding=1
b: frequency=10 encoding=01
c: frequency=12 encoding=00

Then the number of output bits is 40 * 1 + 10 * 2 + 12 * 2 = 84
0
 
LVL 11

Assisted Solution

by:bcladd
bcladd earned 166 total points
ID: 9686462
Yes, of course you can. My brain damage is acting up again.

-bcl
0
 
LVL 7

Assisted Solution

by:burcarpat
burcarpat earned 166 total points
ID: 9691207
alternatively, you can use boost dynamic_bitset for storage and it has support for iostreams:

    http://www.boost.org/libs/dynamic_bitset/dynamic_bitset.html

[ about boost.org :: boost.org is an organization supported by many c++ standards committee members and provides 100% free, peer-reviewed, cross-platform libraries.  many of the boost libraries, such as their smart pointer library, are already in the drafts of the next revision of the c++ standard ]

-- ba
0
 
LVL 9

Expert Comment

by:tinchos
ID: 10242523
No comment has been added lately, so it's time to clean up this TA.
I will leave the following recommendation for this question in the Cleanup topic area:

Split: mtmike {http:#9686458} & bcladd {http:#9686462} & burcarpat {http:#9691207}

Please leave any comments here within the next seven days.
PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER!

Tinchos
EE Cleanup Volunteer
0

Featured Post

Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Introduction This article is the first in a series of articles about the C/C++ Visual Studio Express debugger.  It provides a quick start guide in using the debugger. Part 2 focuses on additional topics in breakpoints.  Lastly, Part 3 focuses on th…
Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
The viewer will learn how to user default arguments when defining functions. This method of defining functions will be contrasted with the non-default-argument of defining functions.
The viewer will be introduced to the technique of using vectors in C++. The video will cover how to define a vector, store values in the vector and retrieve data from the values stored in the vector.

706 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now