bits in c++


i am attempting to write a Huffman compression library, and so far i have the algorithm correct. i am trying to apply this now to compressing an arbitrary file, however i cannot come up with an efficient way of writing the bits to compress the data into a file.

i am trying to use the fstream classes, however i really don't know how to handle the data. i don't know how to write individual bits to a file. i have searched the internet to death on this, and i can't seem to find anything helpful.
Who is Participating?
You want to treat the byte-by-byte output stream as a bitstream, right?  If that's the case, then when one character spans two bytes, you just fill the first byte with as many bits as it'll hold, then write it out, and buffer the rest of the character in the next byte, which will wait to be written until it's been filled.  It's quite simple to implement the output buffering and splitting codes amongst bytes.  The concern would come in reading them back in later.  But since you're doing a Huffman encoding, you're guaranteed that a given symbol is represented by a set of bits that isn't a prefix of any other symbol's code.  So, when decoding, there won't be any confusion as to whether the bits in the next byte are part of the current symbol.  In fact, you can completely abstract the buffering by writing a function, such as GetNextSymbol or GetNextCode, which will read from the bitstream and the frequency table and pull out a single code.

Buffering bits into bytes and then writing them automically is really the only solution, and it's not as bad as it sounds.  I did exactly the same thing while encapsulating steganography functionality in a Java OutputStream.  The only trick comes at the end.  But as above, since you're doing Huffman encoding, you can easily generate an unambiguous terminal code that represents end-of-file.
Hi cimnik029,

The smallest atomic datatype in C++ is a byte, so there's no platform-independent way to write a single bit.  What you would be better off doing would be making a function call that would accept a single byte, but only use the lowest byte, and write it to a stream object that would buffer it until it had 8 queued, then write them as a byte.

cimnik029Author Commented:
i was afraid i would end up with something like that. i considered doing something like that, but the problem is that each letter is represented by a variable length string of bits. so great complexity arises when buffering like to avoid an overrun. what happens when 2 bits are left and i try to write 3?

i was hoping that someone who has "been there" and solved it might have some insight.
Cloud Class® Course: MCSA MCSE Windows Server 2012

This course teaches how to install and configure Windows Server 2012 R2.  It is the first step on your path to becoming a Microsoft Certified Solutions Expert (MCSE).

Have you seen 'std::bitset'? Check out

int main() {
  const bitset<12> mask(2730ul);
  cout << "mask =      " << mask << endl;

  bitset<12> x;

  cout << "Enter a 12-bit bitset in binary: " << flush;
  if (cin >> x) {
    cout << "x =        " << x << endl;
    cout << "As ulong:  " << x.to_ulong() << endl;
    cout << "And with mask: " << (x & mask) << endl;
    cout << "Or with mask:  " << (x | mask) << endl;

    cout << "Setting bit #6" << endl;

    x[5] = 1;

    cout << "x =        " << x << endl;
    cout << "As ulong:  " << x.to_ulong() << endl;
    cout << "And with mask: " << (x & mask) << endl;
    cout << "Or with mask:  " << (x | mask) << endl;


That one you can easily use with an fstream.
there is a way I isolated the bits in a single byte, i found this method in a C++ book:
basically you make a structure and after the variable name you put a colon and the number of bits that the variable should take up.  you declare the var as unsigned but it will act like bits i believe. just as an example, my code went like this:

struct BIT{
 unsigned eight : 1;
 unsigned seven : 1;
 unsigned six : 1;
 unsigned five : 1;
 unsigned four : 1;
 unsigned three : 1;
 unsigned two : 1;
 unsigned one : 1;

struct NIBBLE{
 unsigned two : 4;
 unsigned one : 4;

then for each byte i could access a certain bit or nibble ie  byteOne.eight  would return the eigth bit in that byte.

note that if you store a byte to this structure then it will be stored in reverse order on an intel processor, so i believe that the declaration order matters.

hope this helps

I agree, bitfields could help simplify the syntax involved, but ultimately I don't think it helps solve the compaction problem.  In order to avoid losing the space benefits gained from the Huffman encoding in the first place, you have to store the codes as a contiguous bitstream.  So eventually you'll wind up having to split them across multiple bytes.


Probably the bitset class would help simplify the coding even more than primitive bitfields, but don't you have to set the size of the bitfield explicitly?  So, you could allocate an appropriately sized bitset for each code that you want to write, which would be fine.  I don't believe this class is a good solution to the serialization problem, though, because its stream insertion and extraction operators serialize and deserialize the bitset as a character string representation of the corresponding 1's and 0's.  So instead of saving space by using the Huffman encoding, you'll actually be taking up significantly more space, as each bit of the code will take up a full byte in the output stream.  It'd probably be a great solution for debugging and making sure the encoding is being written out correctly, though.

I think the best solution is to write two stream classes that inherit from ostream and istream.  Override the base class stream insertion and extraction operators so they'll handle single bits and buffer them eight at a time before writing.  Also override the flush() method, to make sure it doesn't mess anything up if called before writing is finished, and override the close() method to append an "End-of-file" code to the currently buffered data, pad the data with zeroes to fill the last byte (if necessary), and write the data out.  Compared to the task of writing a working Huffman encoder in the first place, this is a fairly straightforward and lightweight process.
>>>> however i cannot come up with an efficient way of writing the bits to compress the data into a file.

I don't think it makes sense to write single bits to a file. Better try to setup a possible big buffer - say 64 k - in memory and write it to file if the buffer is full or you are thru. You could create an own buffer class which supports streaming and where you might stream single bits as well:

#include <iostream>
#include <fstream>
using namespace std;

class Bits
     unsigned int buf;
     int               bits;
     Bits(unsigned int ui) : buf(ui), bits(0) { while (ui>>bits) bits++; }
     Bits(unsigned int ui, int high) : buf(ui), bits(high) {}
     Bits(int offset, const Bits& b) : bits(offset + b.bits) { buf = b.buf<<offset; }
     operator unsigned int() { return buf; }
     bool empty() { return buf == 0; }

     friend class Buffer;

class Buffer
    unsigned int    buf[4096];
    int             bitsFilled;
    ostream&        ofs;
    Buffer(ostream& os) : bitsFilled(0), ofs(os) { memset(buf, 0, sizeof(buf)); }
    operator bool() { return (bool)ofs; }
    Buffer& operator << (const Bits& bits)
          int bitsFilledNew = bitsFilled + bits.bits;
          int curIdx            = bitsFilled>>5;
          int offset            = (bitsFilled & 0x1f);
          Bits   b1(offset, bits);
          Bits   b2 = bits.buf >> (offset + bits.bits);        
          *(buf + curIdx) |= (unsigned int)b1;
          if (++curIdx >= sizeof(buf)/sizeof(unsigned int))
                if (!ofs.write((char*)buf, sizeof(buf)))
                     return *this;
                memset(buf, 0, sizeof(buf));
                bitsFilledNew -= sizeof(buf) * 8;
                curIdx = 0;
          *(buf + curIdx) = (unsigned int)b2;
          bitsFilled = bitsFilledNew ;
          return *this;

Regards, Alex
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.