Need help optimizing bitwise operations

Hey all. I'm trying to lower the size of a file (which is full of floats), and this is my current approach; it's something I cooked up real fast earlier today. Unfortunately, I feel the code isn't as optimal as it should be - maybe someone could help me improve it, both in speed and readability. Thanks!
string Int2BCD(string number) {
    string output;
 
    if (number.size() % 2 == 1)
        number.append("0");
 
    char c = '\0';
 
    for (unsigned int i = 0; i < number.size(); i += 2) {
        if (number[i] == '0') {
            c |= 0xC0;
        } else if (number[i] == '1') {
            c |= 0x10;
        } else if (number[i] == '2') {
            c |= 0x20;
        } else if (number[i] == '3') {
            c |= 0x30;
        } else if (number[i] == '4') {
            c |= 0x40;
        } else if (number[i] == '5') {
            c |= 0x50;
        } else if (number[i] == '6') {
            c |= 0x60;
        } else if (number[i] == '7') {
            c |= 0x70;
        } else if (number[i] == '8') {
            c |= 0x80;
        } else if (number[i] == '9') {
            c |= 0x90;
        } else if (number[i] == '-') {
            c |= 0xA0;
        } else if (number[i] == '.') {
            c |= 0xB0;
        }
 
        // i+1
 
        if (number[i + 1] == '0') {
            c |= 0x0C;
        } else if (number[i + 1] == '1') {
            c |= 0x01;
        } else if (number[i + 1] == '2') {
            c |= 0x02;
        } else if (number[i + 1] == '3') {
            c |= 0x03;
        } else if (number[i + 1] == '4') {
            c |= 0x04;
        } else if (number[i + 1] == '5') {
            c |= 0x05;
        } else if (number[i + 1] == '6') {
            c |= 0x06;
        } else if (number[i + 1] == '7') {
            c |= 0x07;
        } else if (number[i + 1] == '8') {
            c |= 0x08;
        } else if (number[i + 1] == '9') {
            c |= 0x09;
        } else if (number[i + 1] == '-') {
            c |= 0x0A;
        } else if (number[i + 1] == '.') {
            c |= 0x0B;
        }
 
        output += c;
        c = '\0';
    }
 
    return output;
}
 
---
 
string BCD2IntString(string number) {
    string output;
    for (unsigned int i = 0; i < number.size(); i++) {
        // Hi Nibble
        if ((number[i] & 0xF0) == 0xC0)
            output += "0";
        else if ((number[i] & 0xF0) == 0x10)
            output += "1";
        else if ((number[i] & 0xF0) == 0x20)
            output += "2";
        else if ((number[i] & 0xF0) == 0x30)
            output += "3";
        else if ((number[i] & 0xF0) == 0x40)
            output += "4";
        else if ((number[i] & 0xF0) == 0x50)
            output += "5";
        else if ((number[i] & 0xF0) == 0x60)
            output += "6";
        else if ((number[i] & 0xF0) == 0x70)
            output += "7";
        else if ((number[i] & 0xF0) == 0x80)
            output += "8";
        else if ((number[i] & 0xF0) == 0x90)
            output += "9";
        else if ((number[i] & 0xF0) == 0xA0)
            output += "-";
        else if ((number[i] & 0xF0) == 0xB0)
            output += ".";
 
        // Lo Nibble
 
        if ((number[i] & 0x0F) == 0x0C)
            output += "0";
        else if ((number[i] & 0x0F) == 0x01)
            output += "1";
        else if ((number[i] & 0x0F) == 0x02)
            output += "2";
        else if ((number[i] & 0x0F) == 0x03)
            output += "3";
        else if ((number[i] & 0x0F) == 0x04)
            output += "4";
        else if ((number[i] & 0x0F) == 0x05)
            output += "5";
        else if ((number[i] & 0x0F) == 0x06)
            output += "6";
        else if ((number[i] & 0x0F) == 0x07)
            output += "7";
        else if ((number[i] & 0x0F) == 0x08)
            output += "8";
        else if ((number[i] & 0x0F) == 0x09)
            output += "9";
        else if ((number[i] & 0x0F) == 0x0A)
            output += "-";
        else if ((number[i] & 0x0F) == 0x0B)
            output += ".";
    }
 
    return output;
}

Open in new window

holobytedAsked:
Who is Participating?
 
figrocConnect With a Mentor Commented:

inline int Char2Int(char c)
{
    swith (c)
    {
        case '-': return 0x0A;
        case '.': return 0x0B;
        case '0': return 0x0C;
        default:
            if ('1' <= c && c <= '9')
                return c - '0';
            break;
    }
    return 0;
}
 
inline char Int2Char(int i)
{
    swith (i)
    {
        case 0x0A: return '-';
        case 0x0B: return '.';
        case 0x0C: return '0';
        default:
            if (0x01 <= i && i <= 0x09)
                return (char)(i + '0');
            break;
    }
    return '\0';
}
 
---
 
string Int2BCD(string number) 
{
    string output;
 
    if (number.size() % 2 == 1)
        number.append("0");
 
    char c = '\0';
 
    for (unsigned int i = 0; i < number.size(); i += 2) {
        c = (Char2Int(number[i]) << 4) | Char2Int(number[i + 1]);
    }
 
    return output;
}
 
---
 
string BCD2IntString(string number) {
    string output;
    for (unsigned int i = 0; i < number.size(); i++) {
        output += Int2Char(number[i] >> 4);
        output += Int2Char(number[i] & 0x0F);
    }
 
    return output;
}

Open in new window

0
 
holobytedAuthor Commented:
That worked. Had to add an output += c you missed in Int2BCD.

How would you go on about creating a BCD2IntFloat function? (like BCD2IntString, but returning a float)
0
 
holobytedAuthor Commented:
Found a bug:

The marked line never prints anything. All it's returning is the number[i] & 0x0F bit.


Testing with the supplied main().
string BCD2IntString(string number) {
    string output;
    for (unsigned int i = 0; i < number.size(); i++) {
        --> output += Int2Char(number[i] >> 4); <--
        output += Int2Char(number[i] & 0x0F);
    }
 
    return output;
}
 
int main(int argc, char** argv) {
    float x = -1.105948;
    cout << Int2BCD(x) << endl;
    cout << BCD2IntString(Int2BCD(x)) << endl;
 
    return (EXIT_SUCCESS);
}

Open in new window

0
Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

 
Infinity08Connect With a Mentor Commented:
>> I'm trying to lower the size of a file (which is full of floats)

Why not store them as floats instead of strings then ?
0
 
figrocCommented:
Try this.
    output += Int2Char((number[i] >> 4) & 0x0F);
I think there is some problem with the value number[i], larger than 0xFF?
0
 
holobytedAuthor Commented:
>> Why not store them as floats instead of strings then ?

I'm trying to keep the resulting file compressable by using text-based compressor, which, AFAIK, lose their ability when dealing w/ binary files.

@figroc:

That worked, thanks!
0
 
sleep_pilotCommented:
If you want more speed, you can trade space for time.

conditional (e.g. if, switch) make the CPU do more work and make cause pipe stalling. (although modern day compiler does a really good job at optimizing it.)

You can trade space for time.  I changed figroc's Char2Int and Int2Char as the following:

You can remove the assert() if you are confidence your program would never go out of range.  I included them here for better coding style.

Since the 2 arrays are static, they will only be initialized once at program start.
#include <assert.h>
inline int Char2Int(char c)
{
    static int C2Iconv[128] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                             0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0x0A, 0x0B, 0,
                             0x0A, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0, 0, 0, 0,
                             0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                             0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                             0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                             0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                             0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
    assert( 0 <= c < 128);
    return(C2Iconv[c]);
}
 
inline char Int2Char(int i)
{
    static char I2Cconv[16] = { '\0', '1', '2', '3', '4', '5', '6', '7',
                                '8', '9', '-', '.', '0', '\0', '\0', '\0'};
    assert(0 <= i < 16);
    return (I2Cconv[i]);
}

Open in new window

0
 
sleep_pilotCommented:
Woops, I put the wrong table up.

Here is the correct array table.
#include <assert.h>
inline int Char2Int(char c)
{
    static int C2Iconv[128] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                             0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                             0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0x0A, 0x0B, 0,
                             0x0C, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0, 0, 0, 0,
                             0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                             0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                             0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                             0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
    assert( 0 <= c < 128);
    return(C2Iconv[c]);
}
 
inline char Int2Char(int i)
{
    static char I2Cconv[16] = { '\0', '1', '2', '3', '4', '5', '6', '7',
                                '8', '9', '-', '.', '0', '\0', '\0', '\0'};
    assert(0 <= i < 16);
    return (I2Cconv[i]);
}

Open in new window

0
 
holobytedAuthor Commented:
What about returning it as a float? Should I just use ostringstream/istringstream?

What other way can you think of to make it more compact, yet remain compressable by BZip2?


0
 
sleep_pilotCommented:
Back to your original question.

If you have a bunch of floating point numbers to store in a file, it's probably much smaller to just stored them in binary format.  For float and double, it's 4-bytes and 8-bytes respectively.  4 bytes as text can cover xx.x which is just 3 significant digits.  (Plus you still need a delimiter.)

Unless you need the floats to be human readable in the text file, I really don't see more bang-for-the-buck in terms of space.  Even if you stored them in binary form, bzip2 should still be able to compress it and just the saving of space alone in binary form should be more then enough to offset the not-as-good-compression.
0
 
Infinity08Commented:
>> If you have a bunch of floating point numbers to store in a file, it's probably much smaller to just stored them in binary format.

See http:#21810811 ;)
0
 
holobytedAuthor Commented:
Noted. However, there are some strings in the file; can both text and binary coexist in the same file?

My code is something like the following atm:


stringstream x;
x << string1 << endl << float1 << endl << float...N << endl;
outputfile << x;
outputfile.close();

What would the resulting code be?

Also, all of this information is currently being stored as a "Holder" class. In the future, I mean to have one "BigHolder" which could host several "Holders"... wow, talk about being abstract, hope that makes any sense... what would you suggest?


Thanks
0
 
Infinity08Commented:
>> can both text and binary coexist in the same file?

You can put text data in a binary file if you want.

Storing floats in a binary file is straightforward though. You just use write to put one or more floats in the file, and read to get them out again :

        http://www.cplusplus.com/reference/iostream/ostream/write.html
        http://www.cplusplus.com/reference/iostream/istream/read.html

(take a look at the code samples on those pages). Note that there are no newlines between the floats - they are stored one after the other without separator.
0
 
holobytedAuthor Commented:
Hrm. Say that I want to serialize this "Holder" class; is there something I could overload so I could do something like:

file.write(&holder); // Serialize holder
file.read(&holder2); // Deserialize holder into holder2

0
 
Infinity08Commented:
>> Say that I want to serialize this "Holder" class; is there something I could overload so I could do something like:

If the file will be read by the same application that wrote it, then you don't have to do anything special (note that the file won't be portable in that case).
0
 
holobytedAuthor Commented:
So I have no control over which fields get written to file and such?

That's the main reason I ended up doing text-based streams... :(
0
 
Infinity08Commented:
>> So I have no control over which fields get written to file and such?

If you want to, you can, just write your own writeToFile method (or whatever you want to call it) in the holder class, and let it write just the data you want ...
0
 
sleep_pilotCommented:
You can create a new class/struct that holds only the fields that you need to write to file, as the last step before writing to file.  That way, you don't need to change you program and you have control over which fields are written out.
0
 
holobytedAuthor Commented:
*nod* Shoulda seen that one coming. I guess I could do a WriteToFile(string file) and WriteToFile(ostream&) so I could do the multiple holders in a single file approach... (right?)


Thanks!
0
 
Infinity08Commented:
>> WriteToFile(ostream&)

I would go for that one rather than the string filename one, but yes, that's the idea.
0
 
holobytedAuthor Commented:
Idea would be to have WriteToFile(string) create the stream and then call WriteToFile(ostream&), as to avoid doing it myself every single time.. Yay laziness!

How much could BZip2 compress the resulting 100% binary file?
0
 
Infinity08Commented:
>> How much could BZip2 compress the resulting 100% binary file?

Depends on the contents of the file, and how similar the float values are. Give it a try with a representative file.
0
 
holobytedAuthor Commented:
Go figure, the resulting binary file is way larger than the compressed text-based one. (180Kb bz2text vs. 550 bz2binary)

Any suggestions?
0
 
sleep_pilotCommented:
What's the size of the pre-compressed files?
Can you list out all the numbers?
text-based, uncompressed
text-based, compressed
binary-based, uncompressed
binary-based, compressed

Can you give an example of the text-based file? (small example should be enough.)
0
 
holobytedAuthor Commented:
I'm on another computer right now, but the sizes were as follows:

text-based, unc: 1.8MB
text-based, comp: 140KB

binary-based, uncomp: (didn't check)
binary-based, comp: 500KB

The file structure is as follows:


<magic #>
string
string
string
long
long
long
int
[N doubles, according to 1st long]
[M doubles, according to 2nd * 3rd long]
{
 int
 int
 char
} [for each int [the one after longs]]

Open in new window

0
 
Infinity08Commented:
How did you fill the binary file ? Can you give a sample of the original file ? Are they all small values maybe ?
0
 
holobytedAuthor Commented:
Most of the floats are in the range [-1, +1] inclusive. I'm actually trying to serialize a neural network to file; those are the inter-neuron weights.

I was filling the file w/ lines like the following:

file.write(reinterpret_cast<char*>(inputWeight[i]), sizeof(float));
0
 
Infinity08Commented:
Can you give a representative sample of the file to get an idea of what kind of values you are dealing with ... ?
0
 
holobytedAuthor Commented:
Here goes. It's only a segment of the file, but it goes on and on:


912559
blah.test
d6011631b1fa3890bcce53ef6fc422fa
0
0
174764
0
1
-0.764606
-0.693147
-0.97186
-1.05209
-0.764606
-0.693147
-0.971861
-1.05209
-0.764606
-0.693147
-0.971861
-1.05209
-0.764606
-0.693147
-0.97186
-1.05209
-0.764606
-0.693147
-0.971861
-1.05209
-0.764606
-0.693147
-0.971861
-1.05209
-0.764606
-0.693147
-0.97186
-1.05209
-0.764606
-0.693147
-0.971861
-1.05209
-0.764606
-0.693147
-0.97186
-1.05209
-0.764606
-0.693147
-0.97186
-1.05209
-0.764606
-0.693147
-0.97186
-1.05209
-0.764606
-0.693147
-0.97186
-1.05209
-0.764606
-0.693147
-0.971861
-1.05209
-0.764606
-0.693147
-0.97186
-1.05209
-0.764606
-0.693147
-0.971861
-1.05209
-0.764606
-0.693147
-0.97186
-1.05209
-0.764606
-0.693147
-0.971861
-1.05209
-0.764606
-0.693147
-0.971861
-1.05209
-0.764606
-0.693147
-0.971861
-1.05209
-0.764606
-0.693147
-0.971861
-1.05209
-0.764606
-0.693147
-0.971861
-1.05209
-0.764606
-0.693147
-0.971861
-1.05209
-0.764606
-0.693147
-0.971861
-1.05209
-0.764606
-0.693147
-0.971861
-1.05209
-0.764606
-0.693147
-0.971861
-1.05209
-0.764606
-0.693147
-0.971861
-1.05209
-0.764606
-0.693147
-0.971861
-1.05209
-0.764606
-0.693147
-0.971861
-1.05209
-0.764606
-0.693147
-0.971861
-1.05209
-0.764606
-0.693147
-0.971861
-1.05209
-0.764606
-0.693147
-0.971861
-1.05209
-0.764606
-0.693147
-0.971861
-1.05209
-0.764606
-0.693147

Open in new window

0
 
Infinity08Commented:
There are a lot of repeating values/patterns in that sample, which helps compression a lot (that's why it's so impressive).

That's also why the binary data didn't get compressed as well - the textual values are "too similar".

In this specific case, it might be better to keep it in text format, and compress it like that.
0
 
holobytedAuthor Commented:
Figured as much. So, would the initial number packing be my only hope of further reducing the filesize? Have any more ideas?
0
 
Infinity08Commented:
>> So, would the initial number packing be my only hope of further reducing the filesize?

Did that actually reduce the file size compared to the compression you posted earlier ? By how much ?
0
 
holobytedAuthor Commented:
I haven't tested it yet, but it's already cutting down all floats by 50% pre-compression... I'll give it a go. But other than that, any ideas?
0
 
sleep_pilotCommented:
I suppose you can turn up the compression of bzip too. (If you are not already at max.)
0
 
Infinity08Commented:
Well, the compression software you use is really good at what it does. And especially in this specific case, it gets some really good input data. Trying to do better than what it already does is probably a waste of time.


>> I'll give it a go.

Let me know what it gives ;)
0
 
holobytedAuthor Commented:
I'll probably be hijacking my own thread, but do you guys happen to know how to use libBZip2 on Windows and Linux? Either that, or an implementation of it (akin to Base64) that could compress it.

I'm actually piping it through to linux's bzip2 command, but as you can see, it's not really portable... (Using -9, max comp.)
0
 
Infinity08Commented:
On the official site (http://bzip.org/), you'll find the source code, which should be compilable on Windows too. Consider using libbzip2 linked to your application instead of calling the bzip2 executable though.
0
 
holobytedAuthor Commented:
I'm currently testing the BCD implementation; however, I've noticed that my compiler (GCC 4.2 & 4.3) is truncating the decimals when being passed to the function. They're fine when I output them to the file, though... (file << decimalX)

double x = 143.598123;
cout << BCD::FromDouble(x) << endl; // treats x as 143.599

How can I stop this from happening?
0
 
holobytedAuthor Commented:
Although this code worked, I've hit the same roadblock as with pure binary files. How would I go on about making it all use the printable characters?

(ie, instead of "10" showing 0001 0000 => unprintable, it'd show something like "A", where "53" would be "d" or something; just an example, don't really care which characters are used [only need them to be printable so BZip2 can work its magic])

Code follows:
inline int BCD::Char2Int(char c) {
    switch (c) {
        case '-': return 0x0A;
        case '.': return 0x0B;
        case '0': return 0x0C;
        default:
            if ('1' <= c && c <= '9')
                return c - '0';
            break;
    }
    return 0;
}
 
inline char BCD::Int2Char(int i) {
    switch (i) {
        case 0x0A: return '-';
        case 0x0B: return '.';
        case 0x0C: return '0';
        default:
            if (0x01 <= i && i <= 0x09)
                return (char) (i + '0');
            break;
    }
 
    return '\0';
}
 
string BCD::FromString(string number) {
    string output;
 
    if (number.size() % 2 == 1)
        number += (number.find(".") != string::npos) ? "0" : ".";
 
    char c = '\0';
 
    for (unsigned int i = 0; i < number.size(); i += 2) {
        c = (Char2Int(number[i]) << 4) | Char2Int(number[i + 1]);
 
        output += c;
        c = '\0';
    }
 
    return output;
}
 
string BCD::ToNormalString(string number) {
    string output;
    for (unsigned int i = 0; i < number.size(); i++) {
        output += Int2Char((number[i] >> 4) & 0x0F);
        output += Int2Char(number[i] & 0x0F);
    }
 
    return output;
}

Open in new window

0
 
Infinity08Commented:
>> cout << BCD::FromDouble(x) << endl; // treats x as 143.599

I'm not sure what FromDouble does, but when you want to output a double value with a certain precision, you should make use of stream manipulators (setprecision more specifically)

        http://www.cplusplus.com/reference/iostream/manipulators/setprecision.html


>> I've hit the same roadblock as with pure binary files.

What roadblock is that ?


>> (ie, instead of "10" showing 0001 0000 => unprintable, it'd show something like "A",

Try outputting the characters in hex :

        http://www.cplusplus.com/reference/iostream/manipulators/hex.html
0
 
holobytedAuthor Commented:
I tried setprecision(), didn't do anything.

cout << setprecision(8) << x;

still returned 143.599.

What I meant by the same roadblock was that using the "number packing" algorithm I initially asked for here, actually incremented the compressed file size by about 100kb, due to some characters being nonprintable.

My next idea was to use the same approach (packing 2 digits into 1 char), but attempting to use the whole spectrum of printable chars. [A-Za-z0-9] plus symbols: !@#$%^&*()_+-/?][';"... probably missed some.

Wow, just noticed that this thread is huge.
0
 
Infinity08Commented:
>> I tried setprecision(), didn't do anything.

Can you show what FromDouble does ?


>> using the "number packing" algorithm I initially asked for here, actually incremented the compressed file size by about 100kb, due to some characters being nonprintable.

That's what I thought would happen :) As I said : the plain text input is already quite ideal for the compression tool ... No reason to try to make it even more ideal ;)


What is it that you are trying to achieve actually ? Isn't the compressed size small enough ?
0
 
holobytedAuthor Commented:
Well, I had to try to achieve lower sizes :p I'm trying to compress the serialized output at its max, whatever it takes... that's the whole idea, actually. The lower, the better.
string BCD::FromDouble(double number) {
    ostringstream stringStream;
    stringStream << setprecision(10) << number;
 
    return FromString(stringStream.str());
}
 
string BCD::FromString(string number) {
    string output;
 
    if (number.size() % 2 == 1)
        number += (number.find(".") != string::npos) ? "0" : ".";
 
    char c = '\0';
 
    for (unsigned int i = 0; i < number.size(); i += 2) {
        c = (Char2Int(number[i]) << 4) | Char2Int(number[i + 1]);
 
        output += c;
        c = '\0';
    }
 
    return output;
}

Open in new window

0
 
Infinity08Commented:
>> The lower, the better.

I think you're getting into the domain where the efforts are not worth the results.



>>     stringStream << setprecision(10) << number;

    stringStream << fixed << setprecision(10) << number;
0
 
holobytedAuthor Commented:
Well, it's time to close this question then. Thanks.

Could you contact me? E-mail is bscheiman; googlemail.
0
 
holobytedAuthor Commented:
Sorry for all the discussions; appreciate it! :)
0
 
Infinity08Commented:
>> Could you contact me?

Why ?
0
 
holobytedAuthor Commented:
I could use some brainstorming. Up to you though...
0
 
Infinity08Commented:
If you post a question about it on EE, I'd be happy to help you out :) We are not allowed to communicate about questions via mail though ...
0
 
holobytedAuthor Commented:
Eh, I just meant all-around brainstorming :p I'll use the site then; thanks
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.