Solved

Can you make this code faster?

Posted on 2003-11-25
28
766 Views
Last Modified: 2012-08-14
Hi Experts,

In a particular calculation that I do, I have three strings (composed of 0s and 1s) that I use to produce a number. All 3 strings (fp1, fp2 and fp3) are of the same length. I compare the 0s and 1s along the length of the string. If fp3 is "1" and either fp1 or fp2 is "1", add one to nA.
If all three strings have "1" at a particular position, add one to nB.
Return value is (double)nB / nA *100

(see post http://www.experts-exchange.com/Programming/Programming_Languages/Cplusplus/Q_20738573.html for some previous history, although the way I'm doing things has changed slightly since then).

In the code below, SimCalc works and gives me the result I require.

I'm having issues now with the lengths of my strings - they are in the tens of thousands in length, and so I've been trying to compress the initial strings with the following sequence of code, in attempt to save some storage space on my hard drive, and also in an attempt to speed things up. The following code works, but I've also suceeded in slowing down the calculation several fold in the function SimCalc2.

I include below my attempt at compression, with some explanation throughout the code of what I'm trying to do.

//CUT HERE ============================================
#include <iostream>
#include <string>
#include <iomanip>
#include <windows.h>            //for GetTickCount function
using namespace std;

double SimCalc(string& fp1, string& fp2, string& fp3);  //prototypes
unsigned char string2char(string& binstring);
string CompressString(string& tfp);
int numbits (unsigned char b);
double SimCalc2(string& fp1, string& fp2, string& fp3, int* pCS, string& fstring);

void main()
{
      string fp1 = "00110010111010010010100111110011";
      string fp2 = "00001101111010110100100110100101";
      string fp3 = "11101001100001101001100110000111";

      int maxit = 10000;      //max iterations for timing
      double n = 0.0;
       unsigned long tic = GetTickCount();
      for ( int k = 0; k<maxit; k++) {
           n= SimCalc( fp1, fp2, fp3);      //standard calculation
                        //strings not compressed
      }
       unsigned long tic2 = GetTickCount();
      cout << "Took simcalc  " << (tic2 -tic) <<" ms for " << maxit << " iterations\n";
          cout << "Answer is "<< n << " " << fp1.length() << " units long\n\n";

      //building a string to contain all unsigned chars - use this in SimCalc2
      string fstring = "";
      for (int z =0; z<= 255;++z) {
            unsigned char w = z;
            fstring += w;
      }

      //Compress all three strings - see function CompressString for details.
      string fp1c = CompressString(fp1);
      string fp2c = CompressString(fp2);
      string fp3c = CompressString(fp3);

      //create array to store the number of bits set in an unsigned char - used
      //in SimCalc2
      int* pCS = new int[255];
      for (int x =0; x<=255;++x) {
            *(pCS + x) = numbits(x);
      }

      //perform SimCalc2 calc with new compressed strings
      n = 0.0;
       tic = GetTickCount();
      for ( k = 0; k<maxit; k++) {
            n= SimCalc2(fp1c, fp2c, fp3c, pCS, fstring);
      }
       tic2 = GetTickCount();
      cout << "Took simcalc2 " << (tic2 -tic) <<" ms for " << maxit << " iterations\n";
       cout << "Answer is "<< n << " " << fp1c.length() << " units long\n\n";
}

double SimCalc(string& fp1, string& fp2, string& fp3)
{
    const char* p1 = fp1.c_str();  //pointer directly to data
    const char* p2 = fp2.c_str();
    const char* p3 = fp3.c_str();

    int nA= 0;
    int nB= 0;
      int nLen= fp1.length();
          for ( int j=0; j< nLen; ++j, ++p1, ++p2, ++p3 ) {
                  if (*p3 == '1')      {
                        if ((*p1 | *p2)== '1') {  
                              ++nA;            
                              if ((*p1 + *p2) == ('1'+'1')){
                                    ++nB;
                              }
                        }
                  }
            }
      if (nA == 0) return 0.0;
    return( ((double)nB / nA) *100 );

}
unsigned char string2char(string& binstring)
{
      //converts a strong of 0s and 1s (8 in length)
      //to an unsigned char
      int x = 128;
      int result = 0;
      string BS = "";
      for (int i = 0; i <8; ++i, x /= 2) {
            BS = binstring.substr(i, 1);
            if (BS == "1") {
                  result += x;
            }
      }
      unsigned char crap = result;
      return result;
}

string CompressString(string& tfp)
{
      //takes a strings of 0s and 1s and changes it,
      //eight characters at a time to a string of unsigned
      //chars
      string result = "";
      while (tfp.length() >7)      {
            string temp = tfp.substr(0,8);
            result += string2char(temp);
            tfp = tfp.substr(8);
      }
      return result;
}

int numbits (unsigned char b)
{
      //counts number of bits set in an unsigned char
      unsigned char mask;
      int result;
      for ( result = 0, mask = 0x80; mask != 0; mask = ( mask >> 1)) {
          if (b & mask) {
                 ++result;
          }
    }
      return result;
}

double SimCalc2(string& fp1, string& fp2, string& fp3, int* pCS, string& fstring)
{
    int nA= 0;
    int nB= 0;
      int p31, p32;
      unsigned char p1, p2, p3;
          for ( int j=0; j< fp1.length(); ++j)      {
                  //following gives me index of the character
                  p1 = fstring.find(fp1.substr(j,1));
                  p2 = fstring.find(fp2.substr(j,1));
                  p3 = fstring.find(fp3.substr(j,1));
                  
                  p31 = (p3 & p1);
                  p32 = (p3 & p2);
                  //link back to pCS array, to give number of bits set
                  nA += *(pCS + (p31 | p32));
                  nB += *(pCS + (p32 & p31));
            }
      if (nA == 0) return 0.0;
    return( ((double)nB / nA) *100 );

}
//CUT TO HERE   ===============================================

Would love your advice on how I can speed up the SimCalc2 calculation, or the way I'm trying to do things.
0
Comment
Question by:s_lakshma
  • 9
  • 7
  • 7
  • +2
28 Comments
 
LVL 7

Expert Comment

by:burcarpat
Comment Utility
hmmm, remind me, why weren't you using std::bitset or something like that but strings instead?  'cause, that would save a lot of space without extra code

-- ba
0
 

Author Comment

by:s_lakshma
Comment Utility
ba-

Am pretty new to C++  - first time I've of heard of bitset. Will have to read up on it.
0
 
LVL 7

Assisted Solution

by:burcarpat
burcarpat earned 30 total points
Comment Utility

std::bitset stores the data in a special structure where you can access individual bits.  you can store 8 bits in a single char, which means your storage requirement is going to be 1/8th of the original.  furthermore, std::bitset is typically optimized nicely

the only drawback is, std::bitset is a static structure, i.e. you can't change its size during runtime.  for that, i suggest using boost's dynamic_set:

    http://www.boost.org/libs/dynamic_bitset/dynamic_bitset.html

one final note.  you can achieve significant speeds gain by building a pre-calculated table for comparing string snippets instead of individual bits.  consider this simple example:  let's say we have only two bitsets, each with 16 bits.  let's say, we want to calculate the mismatch between them ( i.e. whenever the corresponding bit in each bitset is different, we want to increase the distance by 1 ).  this is a simplified version of your problem.  then, you can simply divide the bitsets into 4 snippets ( each containing 4 bits )

it gets a tad more complex after this.  as these 4 bit snippets are nothing but binary numbers, you can actually convert them to their decimal equivalents.  that means for each unique bit combination, there is a unique decimal number that can be used as an index.  once you have these indices, you can pre-calculate a difference table once at the very beginning and then compare strings snippet by snippet instead of bit by bit

let me know if you are interested in this.  i can try to change the code i already have to suit your distance function

-- ba


[ about boost.org :: boost.org is an organization supported by many c++ standards committee members and provides 100% free, peer-reviewed, cross-platform libraries.  many of the boost libraries, such as their smart pointer library, are already in the drafts of the next revision of the c++ standard ]
0
 
LVL 49

Expert Comment

by:DanRollins
Comment Utility
It helps to map these things out.  There are 8 possibilities at each position:

 fp1 fp2 fp3 action
 --- --- --- ---------
  0   0   0  (nothing)
  1   0   0  (nothing)
  0   1   0  (nothing)
  1   1   0  (nothing)
  0   0   1  (nothing)
  1   0   1  nA++
  0   1   1  nA++
  1   1   1  nA++, nB++

Inspection shows that
   if fp3 is 0, nothing happens so if we AND against that, the results will be 0
   if we OR fp1 with fp2 we will get a 1 everywhere we want to increment nA
   If we AND all three, we will get a 1 everywhere we want to increment nB

This boils down nicely to:
   If  (fp1 |  fp2)  &  fp3  is non-zero, nA++
   If  (fp1 & fp2)  &  fp3  is non-zero, nB++

By using straight binary operations like this, we can process 32 bits at a time, which offers fantastic speed advantage over examining each character of three string individually.   So the solution is to take the incomming string composted of '1' and '0' in groups of 32 and convert into a 32-bit binary value.  Then do the ORs and ANDs, then count up the results.  Here is a sanity check on the idea:

#include <windows.h>
#include <stdio.h>
#include <stdlib.h>

void main()
{
      DWORD n1= strtoul( "11111111111111111111110111111011", 0, 2 );  // base 2
      DWORD n2= strtoul( "00000000000010001000000001000000", 0, 2 );  // base 2
      DWORD n3= strtoul( "11111111111111111111111111111111", 0, 2 );  // base 2

      DWORD A= (n1 | n2) & n3;
      DWORD B= (n1 & n2) & n3;

      int nA= 0, nB= 0;
      DWORD nBit= 1;
      for (int j=0; j<32; j++ ) {
            if (A & nBit ) nA++;
            if (B & nBit ) nB++;
            nBit <<= 1;  // move over to next position
      }
      printf( "nA=%d nB=%d\r\n", nA, nB );
}
 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
As to your storage problem... it now becomes trivial...
Read in the strings and convert them to binary by taking 32 characters at a time (the strtoul library function will serve nicely) then output the binary values.  The resulting file will be one-eigth the size of the original.  During your conversion, if you reach the end of the file and have less than 32 characters, append a series of '0's to the end until it is 32 long.

Once the files are binary, you will just read the file in 4-byte chunks (directly into DWORD values) and use the OR and AND sequences shown to accumulate nA and nB.  Actually, for performance, you should probably read all three files into memory and just use DWORD pointers to access the data.

Incidently, this lends itself to an additional -- quite significant -- speedup.  The Pentium has a set of instructions that process 128-bit chunks like this -- the MMX and SSE opcodes.  In fact, they are designed for exactly this type of operation.  Got 500 points ? :)

-- Dan
0
 

Author Comment

by:s_lakshma
Comment Utility
Dan, ba...

While I've been learning about bitsets, I actually figured out that formula.. honest gov!

double SimCalc3(bitset<bsLength>& p1, bitset<bsLength>& p2, bitset<bsLength>& p3)
{
      int nA = (p3 & (p1 | p2)).count();
      int nB = (p3 & (p1 & p2)).count();
      if (nA == 0) return 0.0;
    return( ((double)nB / nA) *100 );

}

In anycase, I'll have to have a look at what you've both suggested over the next few days, to make sure I understand the new stuff, and am able to play with it a little. I've only got an extra 5 points at the moment, as soon as I join, I'll post for the 500 point solution!
0
 
LVL 7

Expert Comment

by:burcarpat
Comment Utility
> "I'll have to have a look at what you've both suggested over the next few days..."

actually, what dan posted is an implementation of what i suggested ( yet, without using stl as is expected from dan ;-) ).  because he went through the trouble of showing implementation examples, he should get the points

-- ba
0
 
LVL 49

Expert Comment

by:DanRollins
Comment Utility
I was joking about the points.  I love this kind of problem :)
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
Comment Utility
I tried to speed up your program by changing only SimCalc2. My last version was as fast as SimCalc1 (16 ms / 10000).

double SimCalc2(string& fp1, string& fp2, string& fp3, int* pCS, string& fstring)
{
    int nA= 0;
    int nB= 0;
    int p31, p32;
    unsigned char p1, p2, p3;
    const char* pp1 = fp1.c_str();
    const char* pp2 = fp2.c_str();
    const char* pp3 = fp3.c_str();
    for ( int j=0; j< fp1.length(); ++j)    
    {
        /*
               //following gives me index of the character
               p1 = fstring.find(fp1.substr(j,1));
               p2 = fstring.find(fp2.substr(j,1));
               p3 = fstring.find(fp3.substr(j,1));
               
               p31 = (p3 & p1);
               p32 = (p3 & p2);
               //link back to pCS array, to give number of bits set
               nA += *(pCS + (p31 | p32));
               nB += *(pCS + (p32 & p31));
        */
        p1 = pp1[j];
        p2 = pp2[j];
        p3 = pp3[j];

        unsigned int t = 0;
        for (int i= 0; i < 8; i++)
        {
            t = 1<<i;
            if ((p3 & t))
            {
                if ((p1 & t) || (p2 & t))
                {
                    nA++;
                    if ((p1 & t) && (p2 & t))
                    {
                        nB++;
                    }
                }
            }
        }
    }
    if (nA == 0) return 0.0;
    return( ((double)nB / nA) *100 );

}

The point is to avoid any string member functions.

Hope, that helps

Alex

BTW, you could speed up CompressString same way but it runs only once here.

0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
Comment Utility
I changed my last approach from bytes to integers and got it to 0ms for 10000 iterations.

------------------- your code ----------------------------
Took simcalc  15 ms for 10000 iterations
Answer is 45.4545 32 units long

Took simcalc2 469 ms for 10000 iterations
Answer is 45.4545 4 units long

------------------ code using cfpx[j] in SimCalc2 --------------------
J:\test\virt\Debug>virt
Took simcalc  16 ms for 10000 iterations
Answer is 45.4545 32 units long

Took simcalc2 63 ms for 10000 iterations
Answer is 45.4545 4 units long

------------------ code using const char* in loop (see last answer) --------------------
J:\test\virt\Debug>virt
Took simcalc  16 ms for 10000 iterations
Answer is 45.4545 32 units long

Took simcalc2 16 ms for 10000 iterations
Answer is 45.4545 4 units long

----------------- last code (see below) -----------------------------------------------------

J:\test\virt\Debug>virt
Took simcalc  16 ms for 10000 iterations
Answer is 45.4545 32 units long

Took simcalc2 0 ms for 10000 iterations
Answer is 45.4545 4 units long


double SimCalc2(string& fp1, string& fp2, string& fp3, int* pCS, string& fstring)
{
    int nA= 0;
    int nB= 0;
    // int p31, p32;
    // unsigned char   p1, p2, p3;
    unsigned int    u1, u2, u3;
    const char*     pp1 = fp1.c_str();
    const char*     pp2 = fp2.c_str();
    const char*     pp3 = fp3.c_str();
    const unsigned* pu1 = (const unsigned*)pp1;
    const unsigned* pu2 = (const unsigned*)pp2;
    const unsigned* pu3 = (const unsigned*)pp3;

    for ( int j=0; j< fp1.length(); j += 4)    
    {
        /*
               //following gives me index of the character
               p1 = fstring.find(fp1.substr(j,1));
               p2 = fstring.find(fp2.substr(j,1));
               p3 = fstring.find(fp3.substr(j,1));
               
               p31 = (p3 & p1);
               p32 = (p3 & p2);
               //link back to pCS array, to give number of bits set
               nA += *(pCS + (p31 | p32));
               nB += *(pCS + (p32 & p31));
        p1 = pp1[j];
        p2 = pp2[j];
        p3 = pp3[j];
        */
        u1 = pu1[j/4];
        u2 = pu2[j/4];
        u3 = pu3[j/4];

        unsigned int t = 0;
        for (int i= 0; i < 32; i++)
        {
            t = 1<<i;
            if ((u3 & t))
            {
                if ((u1 & t) || (u2 & t))
                {
                    nA++;
                    if ((u1 & t) && (u2 & t))
                    {
                        nB++;
                    }
                }
            }
        }
    }
    if (nA == 0) return 0.0;
    return( ((double)nB / nA) *100 );

}

So, if your strings have a length that is a multiple of 32 you might take this code.

Regards, Alex
0
 
LVL 49

Expert Comment

by:DanRollins
Comment Utility
itsmeandnobodyelse,
>>got it to 0ms for 10000 iterations
The resolution of the GetTickCount()-based timer seems to be about 15ms or 16ms on many machines.  And any process that runs for more than 30ms is likely to be interrupted by a task switch one or many times.  

If you ever get a time of 0ms, that means you need to increase th size of your test run by (at least) an order of magitude.

-- Dan
0
 
LVL 7

Expert Comment

by:burcarpat
Comment Utility
not to mention, of course, the test is highly dependent on the test program itself.  depending on how you are testing stuff, the compiler might be actually optimizing certain parts away ( although they won't be optimized away in the actual code ).  also, i'd suggest testing with at least a million if not 10 million

oh, and, when used properly, std::string functions are as fast as their counterparts ( had to say it ;-) )

-- ba
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
Comment Utility
Dan,
you are right. i never got a resolution greater 0 and less than 15 ms.

Increasing to 1000000 gives time numbers like these:

Took simcalc  688 ms for 1000000 iterations
Answer is 45.4545 32 units long

Took simcalc2 844 ms for 1000000 iterations
Answer is 45.4545 4 units long

So, optimized compressed solution was about 20 % slower than non-optimized uncompressed one (on my machine).

A last attempt for SimCalc2 was this:

double SimCalc2(string& fp1, string& fp2, string& fp3) //, int* pCS, string& fstring)
{
    static unsigned  t[32] =
    {   1,     2,     4,     8,     16,    32,    64,    128,   256,   512,   1024,  2048,  4096,  8192,  16384, 32768,
        1<<16, 1<<17, 1<<18, 1<<19, 1<<20, 1<<21, 1<<22, 1<<23, 1<<24, 1<<25, 1<<26, 1<<27, 1<<28, 1<<29, 1<<30, 1<<31,
    };
   
    int nA= 0;
    int nB= 0;
    unsigned int    u1, u2, u3;
    const char*     pp1 = fp1.c_str();
    const char*     pp2 = fp2.c_str();
    const char*     pp3 = fp3.c_str();
    const unsigned* pu1 = (const unsigned*)pp1;
    const unsigned* pu2 = (const unsigned*)pp2;
    const unsigned* pu3 = (const unsigned*)pp3;

    int len = fp1.length() / 4;
    for ( int j=0; j< len; j++)    
    {
       
        u3 = pu3[j];
        u1 = pu1[j];
        u2 = pu2[j];

        unsigned* pt = t;
        for (int i= 0; i < 32; i++, pt++)
        {
            if (u3 & *pt)
            {
                if ((u1 & *pt) || (u2 & *pt))
                    nA++;
                {
                   if ((u1 & *pt) && (u2 & *pt))
                      nB++;
                }
            }
        }
    }
    if (nA == 0) return 0.0;
    return( ((double)nB / nA) *100 );

}

That gives these times

Took simcalc  671 ms for 1000000 iterations
Answer is 45.4545 32 units long

Took simcalc2 672 ms for 1000000 iterations
Answer is 45.4545 4 units long



Regards, Alex



0
 

Author Comment

by:s_lakshma
Comment Utility
Been working away and impementing your various suggestions - these are the timings I get on my laptop for 10 million iterations, and orginal strings 32 in length.

That took simcalc  (ORIGINAL) 5888 ms
Answer is 28.5714 32 units long

That took simcalc3 (bitsets)3295 ms
Answer is 28.5714 4

That took simcalc4 (ALEX) 5478 ms
Answer is 28.5714 4

That took simcalc5 (DAN)4436 ms
Answer is 28.5714 4

So, for the time being, bitsets look the best. I create them from the orginal strings just once with:

      bitset<bsLength> bs1(fp1);
      bitset<bsLength> bs2(fp2);
      bitset<bsLength> bs3(fp3);
 where bsLength is 32

and then the following function is called..

double SimCalc3(bitset<bsLength>& p1, bitset<bsLength>& p2, bitset<bsLength>& p3)
{
      int nA = (p3 & (p1 | p2)).count();
      int nB = (p3 & (p1 & p2)).count();
      if (nA == 0) return 0.0;
    return( ((double)nB / nA) *100 );

}

Going to keep working away on this for a few more days, as I've still got plenty to learn, and I've also got to download the boost stuff. Thanks for hte help guys - I'll probably post back after the holiday weekend....
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
Comment Utility
I stole the algorithm from bitset::count and voilà

Took simcalc  344 ms for 1000000 iterations  Origin
Answer is 45.4545 32 units long

Took simcalc2 281 ms for 1000000 iterations  My best
Answer is 45.4545 4 units long

Took simCalc3 62 ms for 1000000 iterations   bitset
Answer is 45.4545 bitset<32>

Took simCalc4 282 ms for 1000000 iterations  Dan DWORD only
Answer is 45.4545 1 DWORD32 bits

Took simCalc5 62 ms for 1000000 iterations   Dan's + count from bitset
Answer is 45.4545 1 DWORD and count from bitset

The last code was:

double SimCalc3(bitset<bsLen>& p1, bitset<bsLen>& p2, bitset<bsLen>& p3)
{
     int nA = (p3 & (p1 | p2)).count();
     int nB = (p3 & (p1 & p2)).count();
     if (nA == 0) return 0.0;
    return( ((double)nB / nA) *100 );

}

double SimCalc4(DWORD d1, DWORD d2, DWORD d3)
{
     DWORD da= (d1 | d2) & d3;
     DWORD db= (d1 & d2) & d3;

     int nA= 0, nB= 0;
     DWORD nBit= 1;
     for (int j=0; j<32; j++ )
     {
          if (da & nBit ) nA++;
          if (db & nBit ) nB++;
          nBit <<= 1;  // move over to next position
     }
     if (nA == 0) return 0.0;
    return( ((double)nB / nA) *100 );

}

double SimCalc5(DWORD d1, DWORD d2, DWORD d3)
{
     DWORD da= (d1 | d2) & d3;
     DWORD db= (d1 & d2) & d3;

     int nA = 0;
     int nB = 0;
         for (unsigned long _X = da, _Y = db; _X != 0; _X >>= 4, _Y >>=4)
     {
            nA += "\0\1\1\2\1\2\2\3"
                  "\1\2\2\3\2\3\3\4"[_X & 0xF];
            nB += "\0\1\1\2\1\2\2\3"
                  "\1\2\2\3\2\3\3\4"[_Y & 0xF];
     }

     if (nA == 0) return 0.0;
    return( ((double)nB / nA) *100 );

}

it's very tricky using half words that have the bit count statically.
I never saw such thing like "ABCD"[i].

Regards, Alex

0
Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

 
LVL 39

Expert Comment

by:itsmeandnobodyelse
Comment Utility
Sorry, half bytes (4 bits) not half words

Regards, Alex
0
 
LVL 49

Assisted Solution

by:DanRollins
DanRollins earned 70 total points
Comment Utility
clever trick.  I think it could just as easily be written:

    BYTE OnesInNybble[16] = {0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4};
....
    nA +=  OnesInNybble[ _X & 0xF ];
...etc...  

But the authors of STL are the Kings Of Obfuscation.

Presumably a 256-byte table could be used to make that part of it twice as fast -- count 'em up 8-bits at a time, like:
    nA +=  OnesInByte[ _X & 0xFF ];

-- Dan
0
 
LVL 39

Accepted Solution

by:
itsmeandnobodyelse earned 150 total points
Comment Utility
Here it is:

Took simcalc  328 ms for 1000000 iterations
Answer is 45.4545 32 units long

Took simcalc2 281 ms for 1000000 iterations
Answer is 45.4545 4 units long

Took simCalc3 63 ms for 1000000 iterations
Answer is 45.4545 bitset<32>

Took simCalc4 312 ms for 1000000 iterations
Answer is 45.4545 1 DWORD32 bits

Took simCalc5 47 ms for 1000000 iterations
Answer is 45.4545 1 DWORD and count from bitset using bytes


The code:

double SimCalc5(DWORD d1, DWORD d2, DWORD d3)
{
    static BYTE bits[256] =
    {   0, 1, 1, 2, 1, 2, 2, 3,
        1, 2, 2, 3, 2, 3, 3, 4,
        1, 2, 2, 3, 2, 3, 3, 4,
        2, 3, 3, 4, 3, 4, 4, 5,
        1, 2, 2, 3, 2, 3, 3, 4,
        2, 3, 3, 4, 3, 4, 4, 5,
        2, 3, 3, 4, 3, 4, 4, 5,
        3, 4, 4, 5, 4, 5, 5, 6,
        1, 2, 2, 3, 2, 3, 3, 4,
        2, 3, 3, 4, 3, 4, 4, 5,
        2, 3, 3, 4, 3, 4, 4, 5,
        3, 4, 4, 5, 4, 5, 5, 6,
        2, 3, 3, 4, 3, 4, 4, 5,
        3, 4, 4, 5, 4, 5, 5, 6,
        3, 4, 4, 5, 4, 5, 5, 6,
        4, 5, 5, 6, 5, 6, 6, 7,
        1, 2, 2, 3, 2, 3, 3, 4,
        2, 3, 3, 4, 3, 4, 4, 5,
        2, 3, 3, 4, 3, 4, 4, 5,
        3, 4, 4, 5, 4, 5, 5, 6,
        2, 3, 3, 4, 3, 4, 4, 5,
        3, 4, 4, 5, 4, 5, 5, 6,
        3, 4, 4, 5, 4, 5, 5, 6,
        4, 5, 5, 6, 5, 6, 6, 7,
        2, 3, 3, 4, 3, 4, 4, 5,
        3, 4, 4, 5, 4, 5, 5, 6,
        3, 4, 4, 5, 4, 5, 5, 6,
        4, 5, 5, 6, 5, 6, 6, 7,
        3, 4, 4, 5, 4, 5, 5, 6,
        4, 5, 5, 6, 5, 6, 6, 7,
        4, 5, 5, 6, 5, 6, 6, 7,
        5, 6, 6, 7, 6, 7, 7, 8,
    };

     DWORD da= (d1 | d2) & d3;
     DWORD db= (d1 & d2) & d3;

     int nA = 0;
     int nB = 0;
         for (unsigned long _X = da, _Y = db; _X != 0; _X >>= 8, _Y >>=8)
     {
            nA += bits[_X & 0xFF];
            nB += bits[_Y & 0xFF];
     }

     if (nA == 0) return 0.0;
    return( ((double)nB / nA) *100 );

}

You see, it is 25% better.

Regards, Alex




Regards, Alex
0
 
LVL 1

Expert Comment

by:bearcrsw
Comment Utility
very simple suggestion , as I haven't  seen it suggested yet

have you changed your project setting from Debug to Release. Under the Build menu select Set Active Configuration.

Steve

0
 

Author Comment

by:s_lakshma
Comment Utility
Thanks for everybodys efforts. Here is the final code that I'm going to use:

//CUT HERE==========================================================
#include <iostream>
#include <string>
#include <windows.h>            //for GetTickCount function
#include <vector>

using namespace std;

double SimCalc(string& fp1, string& fp2, string& fp3);
double SimCalc9(vector<DWORD>& d1, vector<DWORD>& d2, vector<DWORD>& d3);

void main()
{
      string fp1l = "1001010010101100100100101010010111001";
      string fp2l = "0100101001010110010010010101001011100";
      string fp3l = "1001000101011001001001010100101110010";
      //add zeros to end of strings to get a multiple of
      //32 characters in length.
      int theSize = fp1l.length() / 32;
      int numtoadd = (32 - (fp1l.length() % 32));
      if (fp1l.length() % 32 != 0)
      {
            ++theSize;
            int tempint = fp1l.length() / 32;
            for (int q = 0; q< numtoadd; ++q)
            {
                  fp1l += "0";
                  fp2l += "0";
                  fp3l += "0";
            }
      }

      vector<DWORD> vfp1l(theSize);
      vector<DWORD> vfp2l(theSize);
      vector<DWORD> vfp3l(theSize);

//load up a DWORD vector
      for (int z = 0; z<(fp1l.length() / 32); ++z)
      {
            string temp1 = fp1l.substr(z*32, 32);
            string temp2 = fp2l.substr(z*32, 32);
            string temp3 = fp3l.substr(z*32, 32);

            vfp1l[z] = strtoul(temp1.c_str(), 0, 2);
            vfp2l[z] = strtoul(temp2.c_str(), 0, 2);
            vfp3l[z] = strtoul(temp3.c_str(), 0, 2);
      }
      int maxit = 10000000;

      double n = 0.0;
       unsigned long tic = GetTickCount();
      for ( int k = 0; k<maxit; k++) {
            n= SimCalc9(vfp1l, vfp2l, vfp3l);
      }
       unsigned long tic2 = GetTickCount();
      cout << "That took simcalc9 (dword+stolen count+256+vector)" << (tic2 -tic) <<" ms \n";
    cout << "Answer is "<< n << " " << vfp1l.size() << endl<<endl;
      
       tic = GetTickCount();
      for ( k = 0; k<maxit; k++) {
           n= SimCalc( fp1l, fp2l, fp3l);
      }
       tic2 = GetTickCount();
      cout << "That took simcalc  (ORIGINAL)  " << (tic2 -tic) <<" ms \n";
    cout << "Answer is "<< n << " " << fp1l.length() << " units long\n\n";
}
double SimCalc(string& fp1, string& fp2, string& fp3)
{
    const char* p1 = fp1.c_str();
    const char* p2 = fp2.c_str();
    const char* p3 = fp3.c_str();

    int nA= 0;
    int nB= 0;
      int nLen= fp1.length();
          for ( int j=0; j< nLen; ++j, ++p1, ++p2, ++p3 ) {
                  if (*p3 == '1')      {
                        if ((*p1 | *p2)== '1') {  
                              ++nA;            
                              if ((*p1 + *p2) == ('1'+'1')){
                                    ++nB;
                              }
                        }
                  }
            }
      if (nA == 0) return 0.0;
    return( ((double)nB / nA) *100 );

}
double SimCalc9(vector<DWORD>& d1, vector<DWORD>& d2, vector<DWORD>& d3)
{
    static BYTE bits[256] =
    {   0, 1, 1, 2, 1, 2, 2, 3,
        1, 2, 2, 3, 2, 3, 3, 4,
        1, 2, 2, 3, 2, 3, 3, 4,
        2, 3, 3, 4, 3, 4, 4, 5,
        1, 2, 2, 3, 2, 3, 3, 4,
        2, 3, 3, 4, 3, 4, 4, 5,
        2, 3, 3, 4, 3, 4, 4, 5,
        3, 4, 4, 5, 4, 5, 5, 6,
        1, 2, 2, 3, 2, 3, 3, 4,
        2, 3, 3, 4, 3, 4, 4, 5,
        2, 3, 3, 4, 3, 4, 4, 5,
        3, 4, 4, 5, 4, 5, 5, 6,
        2, 3, 3, 4, 3, 4, 4, 5,
        3, 4, 4, 5, 4, 5, 5, 6,
        3, 4, 4, 5, 4, 5, 5, 6,
        4, 5, 5, 6, 5, 6, 6, 7,
        1, 2, 2, 3, 2, 3, 3, 4,
        2, 3, 3, 4, 3, 4, 4, 5,
        2, 3, 3, 4, 3, 4, 4, 5,
        3, 4, 4, 5, 4, 5, 5, 6,
        2, 3, 3, 4, 3, 4, 4, 5,
        3, 4, 4, 5, 4, 5, 5, 6,
        3, 4, 4, 5, 4, 5, 5, 6,
        4, 5, 5, 6, 5, 6, 6, 7,
        2, 3, 3, 4, 3, 4, 4, 5,
        3, 4, 4, 5, 4, 5, 5, 6,
        3, 4, 4, 5, 4, 5, 5, 6,
        4, 5, 5, 6, 5, 6, 6, 7,
        3, 4, 4, 5, 4, 5, 5, 6,
        4, 5, 5, 6, 5, 6, 6, 7,
        4, 5, 5, 6, 5, 6, 6, 7,
        5, 6, 6, 7, 6, 7, 7, 8,
    };

      DWORD da, db;

     int nA = 0;
     int nB = 0;
       for (int z = 0; z<d1.size();++z)
       {
            da= (d1[z] | d2[z]) & d3[z];
            db= (d1[z] & d2[z]) & d3[z];
            for (unsigned long _X = da, _Y = db; _X != 0; _X >>= 8, _Y >>=8)
            {
                  nA += bits[_X & 0xFF];
                  nB += bits[_Y & 0xFF];
          }
       }
     if (nA == 0) return 0.0;
    return( ((double)nB / nA) *100 );

}
//CUT TO HERE=====================================================

Finally, could someone explain to me, in English, what the line..

                  nA += bits[_X & 0xFF];
 
is actually doing?

0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
Comment Utility
>Finally, could someone explain to me, in English, what the line..
>
>      nA += bits[_X & 0xFF];
>
>is actually doing?

0xFF means all bits set of a byte. By making (_X & 0xFF) we strip down a 4byte unsigned long word to the last byte (or lowest byte of 4)
So, we have a byte that is between 0..255. That byte is used as an index to an array - bits - where the counts of bits of all numbers (byte values) from 0 to 255 is stored, e. g. 7 == bit 0 + bit 1 + bit 2 == 3 bits ==  bits[7] == bits[_X where lowest byte is 7].

By using _X<<8 the loop switches _X 8 bits to the right - filling up zero bits on the left side - that makes the next byte of _X become lowest. At least after 4 loop steps _X is 0 and lthe loop terminates. If one of the higher bytes of _X is 0 the loop terminates earlier.

So, the algorithm is to count the bits of all 4 bytes of a longword by using a predefined count table for all possible bit counts.

Hope, that was clear enough ;-)

Regards, Alex

BTW, i would recommend to split points.
 
0
 
LVL 49

Expert Comment

by:DanRollins
Comment Utility
itsmeandnobodyelse, very nice work on this question!
s_lakshma, thanks for the points and the grade :)
0
 

Author Comment

by:s_lakshma
Comment Utility
I think thats about as close to English as I'm going to get!

I think I understand it now though, enough to maybe use it somewhere else in the future.

Thanks again.
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
Comment Utility
Oops, there is a bug!

change

          for (unsigned long _X = da, _Y = db; _X != 0; _X >>= 8, _Y >>=8)
to

          for (unsigned long _X = da, _Y = db; (_X + _Y)  != 0; _X >>= 8, _Y >>=8)

The first line wouldn't evaluate _Y correctly if one of the highest bytes of _X turns to 0.

Regards, Alex


0
 

Author Comment

by:s_lakshma
Comment Utility
Hi Alex,

I had noticed that. However I think its ok in its original form. It is impossible for nB to be incremented without nA being incremented. So doing the test for continuation with _X is good enough.

S
0
 
LVL 49

Expert Comment

by:DanRollins
Comment Utility
>>  could someone explain to me, in English, what the line.., nA += bits[_X & 0xFF];... is actually doing?

I'll give it a try:
Rather than counting up the ones (as in my original code), this system uses a table that says how many 1-bits exist in each possible byte (it is easy to know this in advance).  

To learn how many 1s exist in a particular byte, just use its value as an index into the table.  

To calculate how many 1s are in a 4-byte DWORD, we can just use each byte as an index into the table and add the looked-up value (number of 1s in that byte) to an accumulator.

To obtain each of four bytes of a DWORD, one at a time, we can AND the DWORD with 0x000000FF and shift it right by 8 bits.  If we do this four times, we will get our hands on four values each of which will be between 0 and 255 inclusive.  That makes them perfect for use with the lookup table.

Incidently, I'm seeing about a 5% speedup by unrolling the loop:

      int nA = 0;
      int nB = 0;
      for ( int z = 0; z<d1.size(); ++z )     {
            DWORD da= (d1[z] | d2[z]) & d3[z];
            DWORD db= (d1[z] & d2[z]) & d3[z];
            nA += bits[ da & 0x000000FF ];
            nB += bits[ db & 0x000000FF ];  
            da >>= 8; db >>= 8;
            nA += bits[ da & 0x000000FF ];
            nB += bits[ db & 0x000000FF ];
            da >>= 8; db >>= 8;
            nA += bits[ da & 0x000000FF ];
            nB += bits[ db & 0x000000FF ];
            da >>= 8; db >>= 8;
            nA += bits[ da & 0x000000FF ];
            nB += bits[ db & 0x000000FF ];
      }
      if (nA == 0) return 0.0;
      return( ((double)nB / nA) *100 );

-- Dan
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
Comment Utility
Finally English and 5% faster.  ;-)


Regards, Alex
0
 

Author Comment

by:s_lakshma
Comment Utility
OK, had to do it. I set up the [256*256] byte table, unrolled the loops, and looked at 16 at a time. Following timings for one million iterations, strings 1280 in length.

That took simcalc  (ORIGINAL)  35341 ms
Answer is 65.3846 1280 units long

That took simcalc9 (DAN+stolen count+256+vector)3796 ms
Answer is 65.3846 40

That took simcalc10 (DAN+stolen count+256+vector+unroll)3234 ms
Answer is 65.3846 40

That took simcalc11 (DAN+stolen count+[256 * 256]+vector+unroll)1983 ms
Answer is 65.3846 40

So the last version takes 5.6% of the time of the original. I think its probably time to stop there!
0
 
LVL 49

Expert Comment

by:DanRollins
Comment Utility
A 64Kb lookup table could be justified if this program must analyse millions of very long strings; for instance, a program that took 3-1/2 hours would run in under 2 hours with that change.  And what modern system doesn't have 64KB of RAM to spare?
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

When writing generic code, using template meta-programming techniques, it is sometimes useful to know if a type is convertible to another type. A good example of when this might be is if you are writing diagnostic instrumentation for code to generat…
Templates For Beginners Or How To Encourage The Compiler To Work For You Introduction This tutorial is targeted at the reader who is, perhaps, familiar with the basics of C++ but would prefer a little slower introduction to the more ad…
The viewer will learn how to clear a vector as well as how to detect empty vectors in C++.
The viewer will be introduced to the technique of using vectors in C++. The video will cover how to define a vector, store values in the vector and retrieve data from the values stored in the vector.

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now