Solved

bits in c++

Posted on 2006-07-13
9
634 Views
Last Modified: 2008-02-01
hi.

i am attempting to write a Huffman compression library, and so far i have the algorithm correct. i am trying to apply this now to compressing an arbitrary file, however i cannot come up with an efficient way of writing the bits to compress the data into a file.

i am trying to use the fstream classes, however i really don't know how to handle the data. i don't know how to write individual bits to a file. i have searched the internet to death on this, and i can't seem to find anything helpful.
0
Comment
Question by:cimnik029
9 Comments
 
LVL 9

Expert Comment

by:smidgie82
Comment Utility
Hi cimnik029,

The smallest atomic datatype in C++ is a byte, so there's no platform-independent way to write a single bit.  What you would be better off doing would be making a function call that would accept a single byte, but only use the lowest byte, and write it to a stream object that would buffer it until it had 8 queued, then write them as a byte.

Cheers!
0
 

Author Comment

by:cimnik029
Comment Utility
i was afraid i would end up with something like that. i considered doing something like that, but the problem is that each letter is represented by a variable length string of bits. so great complexity arises when buffering like to avoid an overrun. what happens when 2 bits are left and i try to write 3?

i was hoping that someone who has "been there" and solved it might have some insight.
0
 
LVL 9

Accepted Solution

by:
smidgie82 earned 125 total points
Comment Utility
You want to treat the byte-by-byte output stream as a bitstream, right?  If that's the case, then when one character spans two bytes, you just fill the first byte with as many bits as it'll hold, then write it out, and buffer the rest of the character in the next byte, which will wait to be written until it's been filled.  It's quite simple to implement the output buffering and splitting codes amongst bytes.  The concern would come in reading them back in later.  But since you're doing a Huffman encoding, you're guaranteed that a given symbol is represented by a set of bits that isn't a prefix of any other symbol's code.  So, when decoding, there won't be any confusion as to whether the bits in the next byte are part of the current symbol.  In fact, you can completely abstract the buffering by writing a function, such as GetNextSymbol or GetNextCode, which will read from the bitstream and the frequency table and pull out a single code.

Buffering bits into bytes and then writing them automically is really the only solution, and it's not as bad as it sounds.  I did exactly the same thing while encapsulating steganography functionality in a Java OutputStream.  The only trick comes at the end.  But as above, since you're doing Huffman encoding, you can easily generate an unambiguous terminal code that represents end-of-file.
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 86

Assisted Solution

by:jkr
jkr earned 125 total points
Comment Utility
Have you seen 'std::bitset'? Check out http://www.sgi.com/tech/stl/bitset.html

int main() {
  const bitset<12> mask(2730ul);
  cout << "mask =      " << mask << endl;

  bitset<12> x;

  cout << "Enter a 12-bit bitset in binary: " << flush;
  if (cin >> x) {
    cout << "x =        " << x << endl;
    cout << "As ulong:  " << x.to_ulong() << endl;
    cout << "And with mask: " << (x & mask) << endl;
    cout << "Or with mask:  " << (x | mask) << endl;

    cout << "Setting bit #6" << endl;

    x[5] = 1;

    cout << "x =        " << x << endl;
    cout << "As ulong:  " << x.to_ulong() << endl;
    cout << "And with mask: " << (x & mask) << endl;
    cout << "Or with mask:  " << (x | mask) << endl;

  }
}

That one you can easily use with an fstream.
0
 
LVL 3

Assisted Solution

by:Dawaffleman
Dawaffleman earned 125 total points
Comment Utility
there is a way I isolated the bits in a single byte, i found this method in a C++ book:
basically you make a structure and after the variable name you put a colon and the number of bits that the variable should take up.  you declare the var as unsigned but it will act like bits i believe. just as an example, my code went like this:

struct BIT{
 unsigned eight : 1;
 unsigned seven : 1;
 unsigned six : 1;
 unsigned five : 1;
 unsigned four : 1;
 unsigned three : 1;
 unsigned two : 1;
 unsigned one : 1;
};

struct NIBBLE{
 unsigned two : 4;
 unsigned one : 4;
};


then for each byte i could access a certain bit or nibble ie  byteOne.eight  would return the eigth bit in that byte.

note that if you store a byte to this structure then it will be stored in reverse order on an intel processor, so i believe that the declaration order matters.

hope this helps
0
 
LVL 9

Expert Comment

by:smidgie82
Comment Utility
@Dawaffleman,

I agree, bitfields could help simplify the syntax involved, but ultimately I don't think it helps solve the compaction problem.  In order to avoid losing the space benefits gained from the Huffman encoding in the first place, you have to store the codes as a contiguous bitstream.  So eventually you'll wind up having to split them across multiple bytes.

@jkr,

Probably the bitset class would help simplify the coding even more than primitive bitfields, but don't you have to set the size of the bitfield explicitly?  So, you could allocate an appropriately sized bitset for each code that you want to write, which would be fine.  I don't believe this class is a good solution to the serialization problem, though, because its stream insertion and extraction operators serialize and deserialize the bitset as a character string representation of the corresponding 1's and 0's.  So instead of saving space by using the Huffman encoding, you'll actually be taking up significantly more space, as each bit of the code will take up a full byte in the output stream.  It'd probably be a great solution for debugging and making sure the encoding is being written out correctly, though.

I think the best solution is to write two stream classes that inherit from ostream and istream.  Override the base class stream insertion and extraction operators so they'll handle single bits and buffer them eight at a time before writing.  Also override the flush() method, to make sure it doesn't mess anything up if called before writing is finished, and override the close() method to append an "End-of-file" code to the currently buffered data, pad the data with zeroes to fill the last byte (if necessary), and write the data out.  Compared to the task of writing a working Huffman encoder in the first place, this is a fairly straightforward and lightweight process.
0
 
LVL 39

Assisted Solution

by:itsmeandnobodyelse
itsmeandnobodyelse earned 125 total points
Comment Utility
>>>> however i cannot come up with an efficient way of writing the bits to compress the data into a file.

I don't think it makes sense to write single bits to a file. Better try to setup a possible big buffer - say 64 k - in memory and write it to file if the buffer is full or you are thru. You could create an own buffer class which supports streaming and where you might stream single bits as well:

#include <iostream>
#include <fstream>
using namespace std;

class Bits
{
     unsigned int buf;
     int               bits;
public:
     Bits(unsigned int ui) : buf(ui), bits(0) { while (ui>>bits) bits++; }
     Bits(unsigned int ui, int high) : buf(ui), bits(high) {}
     Bits(int offset, const Bits& b) : bits(offset + b.bits) { buf = b.buf<<offset; }
     operator unsigned int() { return buf; }
     bool empty() { return buf == 0; }

     friend class Buffer;
};

class Buffer
{
    unsigned int    buf[4096];
    int             bitsFilled;
    ostream&        ofs;
public:
    Buffer(ostream& os) : bitsFilled(0), ofs(os) { memset(buf, 0, sizeof(buf)); }
    operator bool() { return (bool)ofs; }
    Buffer& operator << (const Bits& bits)
    {
          int bitsFilledNew = bitsFilled + bits.bits;
          int curIdx            = bitsFilled>>5;
          int offset            = (bitsFilled & 0x1f);
          Bits   b1(offset, bits);
          Bits   b2 = bits.buf >> (offset + bits.bits);        
          *(buf + curIdx) |= (unsigned int)b1;
          if (++curIdx >= sizeof(buf)/sizeof(unsigned int))
          {
                if (!ofs.write((char*)buf, sizeof(buf)))
                     return *this;
                memset(buf, 0, sizeof(buf));
                bitsFilledNew -= sizeof(buf) * 8;
                curIdx = 0;
          }
          *(buf + curIdx) = (unsigned int)b2;
          bitsFilled = bitsFilledNew ;
          return *this;
    }        
};

Regards, Alex
0

Featured Post

Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

Join & Write a Comment

Unlike C#, C++ doesn't have native support for sealing classes (so they cannot be sub-classed). At the cost of a virtual base class pointer it is possible to implement a pseudo sealing mechanism The trick is to virtually inherit from a base class…
IntroductionThis article is the second in a three part article series on the Visual Studio 2008 Debugger.  It provides tips in setting and using breakpoints. If not familiar with this debugger, you can find a basic introduction in the EE article loc…
The viewer will learn how to use the return statement in functions in C++. The video will also teach the user how to pass data to a function and have the function return data back for further processing.
The viewer will be introduced to the technique of using vectors in C++. The video will cover how to define a vector, store values in the vector and retrieve data from the values stored in the vector.

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now