Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win


bits in c++

Posted on 2006-07-13
Medium Priority
Last Modified: 2008-02-01

i am attempting to write a Huffman compression library, and so far i have the algorithm correct. i am trying to apply this now to compressing an arbitrary file, however i cannot come up with an efficient way of writing the bits to compress the data into a file.

i am trying to use the fstream classes, however i really don't know how to handle the data. i don't know how to write individual bits to a file. i have searched the internet to death on this, and i can't seem to find anything helpful.
Question by:cimnik029
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions

Expert Comment

ID: 17104117
Hi cimnik029,

The smallest atomic datatype in C++ is a byte, so there's no platform-independent way to write a single bit.  What you would be better off doing would be making a function call that would accept a single byte, but only use the lowest byte, and write it to a stream object that would buffer it until it had 8 queued, then write them as a byte.


Author Comment

ID: 17104499
i was afraid i would end up with something like that. i considered doing something like that, but the problem is that each letter is represented by a variable length string of bits. so great complexity arises when buffering like to avoid an overrun. what happens when 2 bits are left and i try to write 3?

i was hoping that someone who has "been there" and solved it might have some insight.

Accepted Solution

smidgie82 earned 500 total points
ID: 17104727
You want to treat the byte-by-byte output stream as a bitstream, right?  If that's the case, then when one character spans two bytes, you just fill the first byte with as many bits as it'll hold, then write it out, and buffer the rest of the character in the next byte, which will wait to be written until it's been filled.  It's quite simple to implement the output buffering and splitting codes amongst bytes.  The concern would come in reading them back in later.  But since you're doing a Huffman encoding, you're guaranteed that a given symbol is represented by a set of bits that isn't a prefix of any other symbol's code.  So, when decoding, there won't be any confusion as to whether the bits in the next byte are part of the current symbol.  In fact, you can completely abstract the buffering by writing a function, such as GetNextSymbol or GetNextCode, which will read from the bitstream and the frequency table and pull out a single code.

Buffering bits into bytes and then writing them automically is really the only solution, and it's not as bad as it sounds.  I did exactly the same thing while encapsulating steganography functionality in a Java OutputStream.  The only trick comes at the end.  But as above, since you're doing Huffman encoding, you can easily generate an unambiguous terminal code that represents end-of-file.
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

LVL 86

Assisted Solution

jkr earned 500 total points
ID: 17105191
Have you seen 'std::bitset'? Check out http://www.sgi.com/tech/stl/bitset.html

int main() {
  const bitset<12> mask(2730ul);
  cout << "mask =      " << mask << endl;

  bitset<12> x;

  cout << "Enter a 12-bit bitset in binary: " << flush;
  if (cin >> x) {
    cout << "x =        " << x << endl;
    cout << "As ulong:  " << x.to_ulong() << endl;
    cout << "And with mask: " << (x & mask) << endl;
    cout << "Or with mask:  " << (x | mask) << endl;

    cout << "Setting bit #6" << endl;

    x[5] = 1;

    cout << "x =        " << x << endl;
    cout << "As ulong:  " << x.to_ulong() << endl;
    cout << "And with mask: " << (x & mask) << endl;
    cout << "Or with mask:  " << (x | mask) << endl;


That one you can easily use with an fstream.

Assisted Solution

Dawaffleman earned 500 total points
ID: 17105967
there is a way I isolated the bits in a single byte, i found this method in a C++ book:
basically you make a structure and after the variable name you put a colon and the number of bits that the variable should take up.  you declare the var as unsigned but it will act like bits i believe. just as an example, my code went like this:

struct BIT{
 unsigned eight : 1;
 unsigned seven : 1;
 unsigned six : 1;
 unsigned five : 1;
 unsigned four : 1;
 unsigned three : 1;
 unsigned two : 1;
 unsigned one : 1;

struct NIBBLE{
 unsigned two : 4;
 unsigned one : 4;

then for each byte i could access a certain bit or nibble ie  byteOne.eight  would return the eigth bit in that byte.

note that if you store a byte to this structure then it will be stored in reverse order on an intel processor, so i believe that the declaration order matters.

hope this helps

Expert Comment

ID: 17108058

I agree, bitfields could help simplify the syntax involved, but ultimately I don't think it helps solve the compaction problem.  In order to avoid losing the space benefits gained from the Huffman encoding in the first place, you have to store the codes as a contiguous bitstream.  So eventually you'll wind up having to split them across multiple bytes.


Probably the bitset class would help simplify the coding even more than primitive bitfields, but don't you have to set the size of the bitfield explicitly?  So, you could allocate an appropriately sized bitset for each code that you want to write, which would be fine.  I don't believe this class is a good solution to the serialization problem, though, because its stream insertion and extraction operators serialize and deserialize the bitset as a character string representation of the corresponding 1's and 0's.  So instead of saving space by using the Huffman encoding, you'll actually be taking up significantly more space, as each bit of the code will take up a full byte in the output stream.  It'd probably be a great solution for debugging and making sure the encoding is being written out correctly, though.

I think the best solution is to write two stream classes that inherit from ostream and istream.  Override the base class stream insertion and extraction operators so they'll handle single bits and buffer them eight at a time before writing.  Also override the flush() method, to make sure it doesn't mess anything up if called before writing is finished, and override the close() method to append an "End-of-file" code to the currently buffered data, pad the data with zeroes to fill the last byte (if necessary), and write the data out.  Compared to the task of writing a working Huffman encoder in the first place, this is a fairly straightforward and lightweight process.
LVL 39

Assisted Solution

itsmeandnobodyelse earned 500 total points
ID: 17109900
>>>> however i cannot come up with an efficient way of writing the bits to compress the data into a file.

I don't think it makes sense to write single bits to a file. Better try to setup a possible big buffer - say 64 k - in memory and write it to file if the buffer is full or you are thru. You could create an own buffer class which supports streaming and where you might stream single bits as well:

#include <iostream>
#include <fstream>
using namespace std;

class Bits
     unsigned int buf;
     int               bits;
     Bits(unsigned int ui) : buf(ui), bits(0) { while (ui>>bits) bits++; }
     Bits(unsigned int ui, int high) : buf(ui), bits(high) {}
     Bits(int offset, const Bits& b) : bits(offset + b.bits) { buf = b.buf<<offset; }
     operator unsigned int() { return buf; }
     bool empty() { return buf == 0; }

     friend class Buffer;

class Buffer
    unsigned int    buf[4096];
    int             bitsFilled;
    ostream&        ofs;
    Buffer(ostream& os) : bitsFilled(0), ofs(os) { memset(buf, 0, sizeof(buf)); }
    operator bool() { return (bool)ofs; }
    Buffer& operator << (const Bits& bits)
          int bitsFilledNew = bitsFilled + bits.bits;
          int curIdx            = bitsFilled>>5;
          int offset            = (bitsFilled & 0x1f);
          Bits   b1(offset, bits);
          Bits   b2 = bits.buf >> (offset + bits.bits);        
          *(buf + curIdx) |= (unsigned int)b1;
          if (++curIdx >= sizeof(buf)/sizeof(unsigned int))
                if (!ofs.write((char*)buf, sizeof(buf)))
                     return *this;
                memset(buf, 0, sizeof(buf));
                bitsFilledNew -= sizeof(buf) * 8;
                curIdx = 0;
          *(buf + curIdx) = (unsigned int)b2;
          bitsFilled = bitsFilledNew ;
          return *this;

Regards, Alex

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction This article is the first in a series of articles about the C/C++ Visual Studio Express debugger.  It provides a quick start guide in using the debugger. Part 2 focuses on additional topics in breakpoints.  Lastly, Part 3 focuses on th…
Introduction This article is a continuation of the C/C++ Visual Studio Express debugger series. Part 1 provided a quick start guide in using the debugger. Part 2 focused on additional topics in breakpoints. As your assignments become a little more …
The viewer will learn how to user default arguments when defining functions. This method of defining functions will be contrasted with the non-default-argument of defining functions.
The viewer will learn how to clear a vector as well as how to detect empty vectors in C++.
Suggested Courses

618 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question