?
Solved

Writing Huffman Code Bits to a file without padding

Posted on 2003-11-05
8
Medium Priority
?
1,402 Views
Last Modified: 2007-12-19
I'm currently writing a Huffman encoding program and I'm horribly stumped as to how to output the bits to the file without the extra bits at the end of the last byte.  Is there a way to truncate the last byte?? pseudocode and/or C++ code would be greatly appreciated.  Thanks in advance.
0
Comment
Question by:ashsaber
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
8 Comments
 
LVL 16

Expert Comment

by:_nn_
ID: 9685758
A file (for that matter, memory behaves the same) is a collection of bytes. That's the smallest unit. You can zero these last bits if you want, but I don't think there's a way to cut them off. At best, I could imagine that you could make some use of these bits by storing there the first bits of the next sequence. I'm not sure it would be so useful though.

What is the exact trouble with the spurious bits ? Please show us what you've done so far, that would certainly help.
0
 
LVL 5

Expert Comment

by:mtmike
ID: 9686111
You can't truncate the byte. I assume you need this for properly decoding an encoded file.

This can be accomplished by storing the number of valid bits in the last byte somewhere in the header of the encoded file. Storing a value of 1-8 only takes three bits.
0
 
LVL 11

Expert Comment

by:bcladd
ID: 9686233
The suggestion of encoding the number of spare bits in the last byte makes sense and only costs you 3 bits as mtmike says. An alternative is to encode an end of file character and stop decoding the file when the EOF is decoded. Advantage of EOF is that it would permit multiple encoded files in one file (assuming they use the same encoding tree). Disadvantage is that it is EOF is a rare character and it is likely to cost more than 3 bits in the stream.

One disadvantage of encoding the number of bits in the last byte (or the number of slack bits, same difference) is you need to know that count earlier, probably in the file header. This means you either have to go back and do fixup in the header or encode the entire file in memory before writing it out. Not as clean as generating a small buffer of bits and writing whenever it is full. More memory management issues.

Just some thinking about it.

-bcl
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 5

Accepted Solution

by:
mtmike earned 672 total points
ID: 9686458
I think that you can compute the number of output bits in advance using the frequency table and bit lengths of the encoded characters.

Suppose the input consists of only the characters 'a', 'b' and 'c' and

a: frequency=40 encoding=1
b: frequency=10 encoding=01
c: frequency=12 encoding=00

Then the number of output bits is 40 * 1 + 10 * 2 + 12 * 2 = 84
0
 
LVL 11

Assisted Solution

by:bcladd
bcladd earned 664 total points
ID: 9686462
Yes, of course you can. My brain damage is acting up again.

-bcl
0
 
LVL 7

Assisted Solution

by:burcarpat
burcarpat earned 664 total points
ID: 9691207
alternatively, you can use boost dynamic_bitset for storage and it has support for iostreams:

    http://www.boost.org/libs/dynamic_bitset/dynamic_bitset.html

[ about boost.org :: boost.org is an organization supported by many c++ standards committee members and provides 100% free, peer-reviewed, cross-platform libraries.  many of the boost libraries, such as their smart pointer library, are already in the drafts of the next revision of the c++ standard ]

-- ba
0
 
LVL 9

Expert Comment

by:tinchos
ID: 10242523
No comment has been added lately, so it's time to clean up this TA.
I will leave the following recommendation for this question in the Cleanup topic area:

Split: mtmike {http:#9686458} & bcladd {http:#9686462} & burcarpat {http:#9691207}

Please leave any comments here within the next seven days.
PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER!

Tinchos
EE Cleanup Volunteer
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Errors will happen. It is a fact of life for the programmer. How and when errors are detected have a great impact on quality and cost of a product. It is better to detect errors at compile time, when possible and practical. Errors that make their wa…
What is C++ STL?: STL stands for Standard Template Library and is a part of standard C++ libraries. It contains many useful data structures (containers) and algorithms, which can spare you a lot of the time. Today we will look at the STL Vector. …
The viewer will learn how to user default arguments when defining functions. This method of defining functions will be contrasted with the non-default-argument of defining functions.
The viewer will learn how to clear a vector as well as how to detect empty vectors in C++.
Suggested Courses

764 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question