Solved

bits in c++

Posted on 2006-07-13
9
636 Views
Last Modified: 2008-02-01
hi.

i am attempting to write a Huffman compression library, and so far i have the algorithm correct. i am trying to apply this now to compressing an arbitrary file, however i cannot come up with an efficient way of writing the bits to compress the data into a file.

i am trying to use the fstream classes, however i really don't know how to handle the data. i don't know how to write individual bits to a file. i have searched the internet to death on this, and i can't seem to find anything helpful.
0
Comment
Question by:cimnik029
9 Comments
 
LVL 9

Expert Comment

by:smidgie82
ID: 17104117
Hi cimnik029,

The smallest atomic datatype in C++ is a byte, so there's no platform-independent way to write a single bit.  What you would be better off doing would be making a function call that would accept a single byte, but only use the lowest byte, and write it to a stream object that would buffer it until it had 8 queued, then write them as a byte.

Cheers!
0
 

Author Comment

by:cimnik029
ID: 17104499
i was afraid i would end up with something like that. i considered doing something like that, but the problem is that each letter is represented by a variable length string of bits. so great complexity arises when buffering like to avoid an overrun. what happens when 2 bits are left and i try to write 3?

i was hoping that someone who has "been there" and solved it might have some insight.
0
 
LVL 9

Accepted Solution

by:
smidgie82 earned 125 total points
ID: 17104727
You want to treat the byte-by-byte output stream as a bitstream, right?  If that's the case, then when one character spans two bytes, you just fill the first byte with as many bits as it'll hold, then write it out, and buffer the rest of the character in the next byte, which will wait to be written until it's been filled.  It's quite simple to implement the output buffering and splitting codes amongst bytes.  The concern would come in reading them back in later.  But since you're doing a Huffman encoding, you're guaranteed that a given symbol is represented by a set of bits that isn't a prefix of any other symbol's code.  So, when decoding, there won't be any confusion as to whether the bits in the next byte are part of the current symbol.  In fact, you can completely abstract the buffering by writing a function, such as GetNextSymbol or GetNextCode, which will read from the bitstream and the frequency table and pull out a single code.

Buffering bits into bytes and then writing them automically is really the only solution, and it's not as bad as it sounds.  I did exactly the same thing while encapsulating steganography functionality in a Java OutputStream.  The only trick comes at the end.  But as above, since you're doing Huffman encoding, you can easily generate an unambiguous terminal code that represents end-of-file.
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 86

Assisted Solution

by:jkr
jkr earned 125 total points
ID: 17105191
Have you seen 'std::bitset'? Check out http://www.sgi.com/tech/stl/bitset.html

int main() {
  const bitset<12> mask(2730ul);
  cout << "mask =      " << mask << endl;

  bitset<12> x;

  cout << "Enter a 12-bit bitset in binary: " << flush;
  if (cin >> x) {
    cout << "x =        " << x << endl;
    cout << "As ulong:  " << x.to_ulong() << endl;
    cout << "And with mask: " << (x & mask) << endl;
    cout << "Or with mask:  " << (x | mask) << endl;

    cout << "Setting bit #6" << endl;

    x[5] = 1;

    cout << "x =        " << x << endl;
    cout << "As ulong:  " << x.to_ulong() << endl;
    cout << "And with mask: " << (x & mask) << endl;
    cout << "Or with mask:  " << (x | mask) << endl;

  }
}

That one you can easily use with an fstream.
0
 
LVL 3

Assisted Solution

by:Dawaffleman
Dawaffleman earned 125 total points
ID: 17105967
there is a way I isolated the bits in a single byte, i found this method in a C++ book:
basically you make a structure and after the variable name you put a colon and the number of bits that the variable should take up.  you declare the var as unsigned but it will act like bits i believe. just as an example, my code went like this:

struct BIT{
 unsigned eight : 1;
 unsigned seven : 1;
 unsigned six : 1;
 unsigned five : 1;
 unsigned four : 1;
 unsigned three : 1;
 unsigned two : 1;
 unsigned one : 1;
};

struct NIBBLE{
 unsigned two : 4;
 unsigned one : 4;
};


then for each byte i could access a certain bit or nibble ie  byteOne.eight  would return the eigth bit in that byte.

note that if you store a byte to this structure then it will be stored in reverse order on an intel processor, so i believe that the declaration order matters.

hope this helps
0
 
LVL 9

Expert Comment

by:smidgie82
ID: 17108058
@Dawaffleman,

I agree, bitfields could help simplify the syntax involved, but ultimately I don't think it helps solve the compaction problem.  In order to avoid losing the space benefits gained from the Huffman encoding in the first place, you have to store the codes as a contiguous bitstream.  So eventually you'll wind up having to split them across multiple bytes.

@jkr,

Probably the bitset class would help simplify the coding even more than primitive bitfields, but don't you have to set the size of the bitfield explicitly?  So, you could allocate an appropriately sized bitset for each code that you want to write, which would be fine.  I don't believe this class is a good solution to the serialization problem, though, because its stream insertion and extraction operators serialize and deserialize the bitset as a character string representation of the corresponding 1's and 0's.  So instead of saving space by using the Huffman encoding, you'll actually be taking up significantly more space, as each bit of the code will take up a full byte in the output stream.  It'd probably be a great solution for debugging and making sure the encoding is being written out correctly, though.

I think the best solution is to write two stream classes that inherit from ostream and istream.  Override the base class stream insertion and extraction operators so they'll handle single bits and buffer them eight at a time before writing.  Also override the flush() method, to make sure it doesn't mess anything up if called before writing is finished, and override the close() method to append an "End-of-file" code to the currently buffered data, pad the data with zeroes to fill the last byte (if necessary), and write the data out.  Compared to the task of writing a working Huffman encoder in the first place, this is a fairly straightforward and lightweight process.
0
 
LVL 39

Assisted Solution

by:itsmeandnobodyelse
itsmeandnobodyelse earned 125 total points
ID: 17109900
>>>> however i cannot come up with an efficient way of writing the bits to compress the data into a file.

I don't think it makes sense to write single bits to a file. Better try to setup a possible big buffer - say 64 k - in memory and write it to file if the buffer is full or you are thru. You could create an own buffer class which supports streaming and where you might stream single bits as well:

#include <iostream>
#include <fstream>
using namespace std;

class Bits
{
     unsigned int buf;
     int               bits;
public:
     Bits(unsigned int ui) : buf(ui), bits(0) { while (ui>>bits) bits++; }
     Bits(unsigned int ui, int high) : buf(ui), bits(high) {}
     Bits(int offset, const Bits& b) : bits(offset + b.bits) { buf = b.buf<<offset; }
     operator unsigned int() { return buf; }
     bool empty() { return buf == 0; }

     friend class Buffer;
};

class Buffer
{
    unsigned int    buf[4096];
    int             bitsFilled;
    ostream&        ofs;
public:
    Buffer(ostream& os) : bitsFilled(0), ofs(os) { memset(buf, 0, sizeof(buf)); }
    operator bool() { return (bool)ofs; }
    Buffer& operator << (const Bits& bits)
    {
          int bitsFilledNew = bitsFilled + bits.bits;
          int curIdx            = bitsFilled>>5;
          int offset            = (bitsFilled & 0x1f);
          Bits   b1(offset, bits);
          Bits   b2 = bits.buf >> (offset + bits.bits);        
          *(buf + curIdx) |= (unsigned int)b1;
          if (++curIdx >= sizeof(buf)/sizeof(unsigned int))
          {
                if (!ofs.write((char*)buf, sizeof(buf)))
                     return *this;
                memset(buf, 0, sizeof(buf));
                bitsFilledNew -= sizeof(buf) * 8;
                curIdx = 0;
          }
          *(buf + curIdx) = (unsigned int)b2;
          bitsFilled = bitsFilledNew ;
          return *this;
    }        
};

Regards, Alex
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Introduction This article is a continuation of the C/C++ Visual Studio Express debugger series. Part 1 provided a quick start guide in using the debugger. Part 2 focused on additional topics in breakpoints. As your assignments become a little more …
Many modern programming languages support the concept of a property -- a class member that combines characteristics of both a data member and a method.  These are sometimes called "smart fields" because you can add logic that is applied automaticall…
The viewer will learn how to use the return statement in functions in C++. The video will also teach the user how to pass data to a function and have the function return data back for further processing.
The viewer will be introduced to the technique of using vectors in C++. The video will cover how to define a vector, store values in the vector and retrieve data from the values stored in the vector.

920 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now