cimnik029
asked on
bits in c++
hi.
i am attempting to write a Huffman compression library, and so far i have the algorithm correct. i am trying to apply this now to compressing an arbitrary file, however i cannot come up with an efficient way of writing the bits to compress the data into a file.
i am trying to use the fstream classes, however i really don't know how to handle the data. i don't know how to write individual bits to a file. i have searched the internet to death on this, and i can't seem to find anything helpful.
i am attempting to write a Huffman compression library, and so far i have the algorithm correct. i am trying to apply this now to compressing an arbitrary file, however i cannot come up with an efficient way of writing the bits to compress the data into a file.
i am trying to use the fstream classes, however i really don't know how to handle the data. i don't know how to write individual bits to a file. i have searched the internet to death on this, and i can't seem to find anything helpful.
ASKER
i was afraid i would end up with something like that. i considered doing something like that, but the problem is that each letter is represented by a variable length string of bits. so great complexity arises when buffering like to avoid an overrun. what happens when 2 bits are left and i try to write 3?
i was hoping that someone who has "been there" and solved it might have some insight.
i was hoping that someone who has "been there" and solved it might have some insight.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
@Dawaffleman,
I agree, bitfields could help simplify the syntax involved, but ultimately I don't think it helps solve the compaction problem. In order to avoid losing the space benefits gained from the Huffman encoding in the first place, you have to store the codes as a contiguous bitstream. So eventually you'll wind up having to split them across multiple bytes.
@jkr,
Probably the bitset class would help simplify the coding even more than primitive bitfields, but don't you have to set the size of the bitfield explicitly? So, you could allocate an appropriately sized bitset for each code that you want to write, which would be fine. I don't believe this class is a good solution to the serialization problem, though, because its stream insertion and extraction operators serialize and deserialize the bitset as a character string representation of the corresponding 1's and 0's. So instead of saving space by using the Huffman encoding, you'll actually be taking up significantly more space, as each bit of the code will take up a full byte in the output stream. It'd probably be a great solution for debugging and making sure the encoding is being written out correctly, though.
I think the best solution is to write two stream classes that inherit from ostream and istream. Override the base class stream insertion and extraction operators so they'll handle single bits and buffer them eight at a time before writing. Also override the flush() method, to make sure it doesn't mess anything up if called before writing is finished, and override the close() method to append an "End-of-file" code to the currently buffered data, pad the data with zeroes to fill the last byte (if necessary), and write the data out. Compared to the task of writing a working Huffman encoder in the first place, this is a fairly straightforward and lightweight process.
I agree, bitfields could help simplify the syntax involved, but ultimately I don't think it helps solve the compaction problem. In order to avoid losing the space benefits gained from the Huffman encoding in the first place, you have to store the codes as a contiguous bitstream. So eventually you'll wind up having to split them across multiple bytes.
@jkr,
Probably the bitset class would help simplify the coding even more than primitive bitfields, but don't you have to set the size of the bitfield explicitly? So, you could allocate an appropriately sized bitset for each code that you want to write, which would be fine. I don't believe this class is a good solution to the serialization problem, though, because its stream insertion and extraction operators serialize and deserialize the bitset as a character string representation of the corresponding 1's and 0's. So instead of saving space by using the Huffman encoding, you'll actually be taking up significantly more space, as each bit of the code will take up a full byte in the output stream. It'd probably be a great solution for debugging and making sure the encoding is being written out correctly, though.
I think the best solution is to write two stream classes that inherit from ostream and istream. Override the base class stream insertion and extraction operators so they'll handle single bits and buffer them eight at a time before writing. Also override the flush() method, to make sure it doesn't mess anything up if called before writing is finished, and override the close() method to append an "End-of-file" code to the currently buffered data, pad the data with zeroes to fill the last byte (if necessary), and write the data out. Compared to the task of writing a working Huffman encoder in the first place, this is a fairly straightforward and lightweight process.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
The smallest atomic datatype in C++ is a byte, so there's no platform-independent way to write a single bit. What you would be better off doing would be making a function call that would accept a single byte, but only use the lowest byte, and write it to a stream object that would buffer it until it had 8 queued, then write them as a byte.
Cheers!