Solved

Writing Huffman Code Bits to a file without padding

Posted on 2003-11-05
8
1,384 Views
Last Modified: 2007-12-19
I'm currently writing a Huffman encoding program and I'm horribly stumped as to how to output the bits to the file without the extra bits at the end of the last byte.  Is there a way to truncate the last byte?? pseudocode and/or C++ code would be greatly appreciated.  Thanks in advance.
0
Comment
Question by:ashsaber
8 Comments
 
LVL 16

Expert Comment

by:_nn_
ID: 9685758
A file (for that matter, memory behaves the same) is a collection of bytes. That's the smallest unit. You can zero these last bits if you want, but I don't think there's a way to cut them off. At best, I could imagine that you could make some use of these bits by storing there the first bits of the next sequence. I'm not sure it would be so useful though.

What is the exact trouble with the spurious bits ? Please show us what you've done so far, that would certainly help.
0
 
LVL 5

Expert Comment

by:mtmike
ID: 9686111
You can't truncate the byte. I assume you need this for properly decoding an encoded file.

This can be accomplished by storing the number of valid bits in the last byte somewhere in the header of the encoded file. Storing a value of 1-8 only takes three bits.
0
 
LVL 11

Expert Comment

by:bcladd
ID: 9686233
The suggestion of encoding the number of spare bits in the last byte makes sense and only costs you 3 bits as mtmike says. An alternative is to encode an end of file character and stop decoding the file when the EOF is decoded. Advantage of EOF is that it would permit multiple encoded files in one file (assuming they use the same encoding tree). Disadvantage is that it is EOF is a rare character and it is likely to cost more than 3 bits in the stream.

One disadvantage of encoding the number of bits in the last byte (or the number of slack bits, same difference) is you need to know that count earlier, probably in the file header. This means you either have to go back and do fixup in the header or encode the entire file in memory before writing it out. Not as clean as generating a small buffer of bits and writing whenever it is full. More memory management issues.

Just some thinking about it.

-bcl
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 5

Accepted Solution

by:
mtmike earned 168 total points
ID: 9686458
I think that you can compute the number of output bits in advance using the frequency table and bit lengths of the encoded characters.

Suppose the input consists of only the characters 'a', 'b' and 'c' and

a: frequency=40 encoding=1
b: frequency=10 encoding=01
c: frequency=12 encoding=00

Then the number of output bits is 40 * 1 + 10 * 2 + 12 * 2 = 84
0
 
LVL 11

Assisted Solution

by:bcladd
bcladd earned 166 total points
ID: 9686462
Yes, of course you can. My brain damage is acting up again.

-bcl
0
 
LVL 7

Assisted Solution

by:burcarpat
burcarpat earned 166 total points
ID: 9691207
alternatively, you can use boost dynamic_bitset for storage and it has support for iostreams:

    http://www.boost.org/libs/dynamic_bitset/dynamic_bitset.html

[ about boost.org :: boost.org is an organization supported by many c++ standards committee members and provides 100% free, peer-reviewed, cross-platform libraries.  many of the boost libraries, such as their smart pointer library, are already in the drafts of the next revision of the c++ standard ]

-- ba
0
 
LVL 9

Expert Comment

by:tinchos
ID: 10242523
No comment has been added lately, so it's time to clean up this TA.
I will leave the following recommendation for this question in the Cleanup topic area:

Split: mtmike {http:#9686458} & bcladd {http:#9686462} & burcarpat {http:#9691207}

Please leave any comments here within the next seven days.
PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER!

Tinchos
EE Cleanup Volunteer
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In days of old, returning something by value from a function in C++ was necessarily avoided because it would, invariably, involve one or even two copies of the object being created and potentially costly calls to a copy-constructor and destructor. A…
Introduction This article is the first in a series of articles about the C/C++ Visual Studio Express debugger.  It provides a quick start guide in using the debugger. Part 2 focuses on additional topics in breakpoints.  Lastly, Part 3 focuses on th…
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…
The viewer will learn how to clear a vector as well as how to detect empty vectors in C++.

895 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now