Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

Calculating Pi using BBP forumula - optimisations

Posted on 2005-05-01
26
Medium Priority
?
563 Views
Last Modified: 2007-12-19
Hi, currently I am working on a distributed computing project to work out the mathematical constant 'pi'. After taking a look at many examples and formulas for pi I have choson to use a common implementation of the BBP formula. This is a digit extraction formula which is able to compute any digit in Pi without having to know the n-1 digit. However, I have a problem with the script, it is quite slow, each time it is run it returns 9 digits of Pi from the value of n. If n was 1 it would return digits 1-9 . If in my basic test program I set n to be 10,000 it takes around 85 seconds to work out nine digits starting at position 10,000 of pi.

This is a problem for, as when larger values are reached (billions) it could takes weeks of cpu time just to get nine more digits. However, I am quite sure there is room for optimisation in the current formula I use. I a currently not sure which part of the formula is causing the slowdown, as my attempt to add in some debugging code failed. I have also been told by some people that it would be faster if the program used hardware cpu optimisations (sse?), however I am unsure of this. I have a feeling that is it the calculating of the prime numbers that is taking the time but I am still unsure. Here is a link to my current code as I did not want to post it in this topic.
http://www.dcrez.co.uk/create/pi.c
Thank you for any help/advice you can give .
0
Comment
Question by:EvilGuru
  • 10
  • 5
  • 5
  • +2
26 Comments
 
LVL 37

Expert Comment

by:Harisha M G
ID: 13903794
Hi EvilGuru,
    That is a great code...

    Here is the output in my computer... :)

Position:     10000 Digits: 856672279 Time:    243.09 seconds
Press any key to continue . . .

I made some changes... Then I got this result...

Position:     10000 Digits: 856672279 Time:    235.57 seconds
Press any key to continue . . .

The change...
___________________________________________
int next_prime(int n)
{
   n = n - 1 + n%2;            // Make n a odd number
   do {
      n+=2;                    // Here you will jump to only the odd numbers
   } while (!is_prime(n));
   return n;
}
___________________________________________

Bye
---
Harish
0
 
LVL 37

Expert Comment

by:Harisha M G
ID: 13903816
EvilGuru,

http://mathworld.wolfram.com/QuadraticSieve.html
has a better prime checking algorithm... but I don't know how to implement that :(
0
 

Author Comment

by:EvilGuru
ID: 13903853
Thanks! That got me down a good few seconds here as well. I am currently not sure what is taking up the time in the program, I have an idea that it is the primes so I thought about 'spoon feeding' the program a list of prime numbers in an array, however I am not totally sure how to do that or if there will be a large gain from it.
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 37

Expert Comment

by:Harisha M G
ID: 13903861
Surely prime checking is the delay factor

If you want that... look at these

http://primes.utm.edu/nthprime/algorithm.php
http://www.troubleshooters.com/codecorn/primenumbers/primenumbers.htm
0
 
LVL 37

Expert Comment

by:Harisha M G
ID: 13903866
If you want to spoon feed them...
http://www.nist.gov/dads/HTML/sieve.html
0
 
LVL 37

Expert Comment

by:Harisha M G
ID: 13903867
0
 

Author Comment

by:EvilGuru
ID: 13903957
Can anyone help implement that amazing C prime finder into the int is_prime function? I have had a few tries but cant seem to find out which bit actually does it. Thanks for all of your help, it is nothing short of amazing.
0
 
LVL 22

Expert Comment

by:grg99
ID: 13904088
You might want to look into how "Super pi" does its calculations.  According to its benchmarks file, ti can do ONE MILLION digits of Pi in under 300 seconds, and that's on an ancient 90MHz PC.

0
 

Author Comment

by:EvilGuru
ID: 13904230
Currently, using different algorithms I can do around 10 million in about 60 seconds. However, that method is both memory intensive and can not be used in distributed computng (which is what this project is about), so while this method may be slower, it is the only method and so must be optimisted as best as we can make it.
0
 
LVL 16

Expert Comment

by:PaulCaswell
ID: 13904732
I got:

Position:     10000 Digits: 856672279 Time:    111.81 seconds
Press any key to continue . . .

It is compiled under VC++. I had to change:

#define mul_mod(a,b,m) fmod( (double) a * (double) b, m)

to

#define mul_mod(a,b,m) (int)fmod( (double) a * (double) b, m)

because I was getting loads of warnings but I dont think that was the problem. I am going to use one of my code timing tools on it to get a better idea of where the time is going.

Back in a while.

Paul
0
 

Author Comment

by:EvilGuru
ID: 13907581
Whoever manages to integrates a fast prime system (and allows it to work with numbers up to 2 trillion) into the code will probably be the accepted answer, along with mgh_mgharish who found the great information on primes.
0
 
LVL 22

Expert Comment

by:grg99
ID: 13908094
Maybe it's just me, but it seems silly to use an algorithm that is TWO MILLION times slower than another algorithm, just because it's distributable.  If it's that slow, it's obvious this is an experiment in distributed processing, not a real attempt to make a fast Pi-finder.   So why spend much time on this aspect?

And it's risky to use (double) arithmetic--- with these algorithms you need EXACT integer results, which is something  floating-point numbers cannot guarantee.

0
 

Author Comment

by:EvilGuru
ID: 13908402
The accuracy of the formula is 100%, the use of double gives us 15 digits of accuracy when we only need 9. The slowness come from the fact that to work out large amounts huge amounts of ram would be needed (looking in tb here) as well as a cluster of many cpu's. The idea was not to make it fast, but most other forumulas have a headroom of around 17 billion, the idea of this is to make it faster, in which we have found out what is causing the slowdown and are not hoping for integration. Just take a look at pihex, they managed to find the quadillionth bit of Pi in around two years using a few slow 400mhz pc's.
0
 
LVL 22

Expert Comment

by:grg99
ID: 13908648
>the use of double gives us 15 digits of accuracy when we only need 9.

You may be right.

But have you considered what happens along these lines:


   15 minus 9 is six.  So you seem to have  5 (plus or minus one) digits of extra accuracy.

But that means that somewhere between one in ten thousand, to one in a million times, the extra digits are going to be all nines, or all zeroes, plus or minus a bit.    Which is going to generate a carry and affect the significant digits.  And the significant digits could be anything up to "999999999" to start with, so the last bit of inaccuracy can totally bollix your result.


Even if type double gives you 15 digits of accuracy, or 1000 digits of accuracy, or  a MILLION digits, it's not enough for doing truly, really, actually, verifiable,  precise integer math.

At least the way I see it.



0
 
LVL 3

Expert Comment

by:joju
ID: 13908963
any way the prime is not eating time..

the suspected code is

   for(k=1;k<=N;k++) {

       t=k;
       if (kq >= a) {
         do {
            t=t/a;
            v--;
         } while ((t % a) == 0);
         kq=0;
      }
      kq++;
      num=mul_mod(num,t,av);

      t=((k*2)-1);
      if (kq2 >= a) {
         if (kq2 == a) {
            do {
               t=t/a;
               v++;
            } while ((t % a) == 0);
         }
         kq2-=a;
      }
     den=mul_mod(den,t,av);
      kq2+=2;

       if (v > 0) {
        t=inv_mod(den,av);
        t=mul_mod(t,num,av);
        t=mul_mod(t,k,av);
        for(i=v;i<vmax;i++) t=mul_mod(t,a,av);
        s+=t;
        if (s>=av) s-=av;
      }

    }


Try commenting this part of the code..

it excecutes in .01 seconds


Thanxs,
Joju
0
 

Author Comment

by:EvilGuru
ID: 13909464
grg99, interesting point. To test it I went to http://www.angio.net/pi/bigpi.cgi and typed in "999999" and at point 762 that way found. From this I set the client up to  753 (+9 = 762, right where the nines start) and it gave me the result "870721134", not rounding up the 4 > 5 even though there was a string of nines. Is one of us seeing this wrong?
0
 
LVL 16

Expert Comment

by:PaulCaswell
ID: 13913081
EvilGuru,

I've been a bit long over this one as I had to reengineer my millisecond timer to use QueryPerformanceFrequency under Windoze. Here's some results:

Timer      Time      Count      Mean(Secs)      Name
0      21858061      1      6.106380      pseudo_main
1      3360      870      0.000001      log(2*N)/log(a)
2      2679      870      0.000001      fmod(sum+(double) s/ (double) av,1.0)
3      13135784      2947560      0.000001      k
4      21858043      1      6.106375      a
5      16012      870      0.000005      pow_mod;mul_mod
6      35474      870      0.000011      next_prime
7      13547      6776      0.000001      is_prime
8      9199      870      0.000003      pow_mod
9      21787312      870      0.006996      k2

My timer system records up to 10 timers. The time field is the total time spect. The count is the number of times that bit of code was performed, the mean(Secs) is the mean time spent in the section. The name should be obvious.

Here's the source so you can see what sections I was timing:

#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <time.h>

extern "C" {
#include "CDefs.h"
#include "CodeTime.h"
 void Idle ( void ) {};
 DEBUG;
};

TimerData CodeTimers[NTIMES] = {0};
/* #define HAS_LONG_LONG */

#ifdef HAS_LONG_LONG
#define mul_mod(a,b,m) (( (long long) (a) * (long long) (b) ) % (m))
#else
#define mul_mod(a,b,m) (int)fmod( (double) a * (double) b, m)
#endif

/* returns the inverse of x mod y */

int inv_mod(int x,int y) {
      int q,u,v,a,c,t;
      
      u=x;
      v=y;
      c=1;
      a=0;
      do {
            q=v/u;
            
            t=c;
            c=a-q*c;
            a=t;
            
            t=u;
            u=v-q*u;
            v=t;
      } while (u!=0);
      a=a%y;
      if (a<0) a=y+a;
      return a;
}

/* returns (a^b) mod m */

int pow_mod(int a,int b,int m)
{
      int r,aa;
      TESTING(StartTime(8,"pow_mod"););
      
      r=1;
      aa=a;
      while (1) {
            if (b&1) r=mul_mod(r,aa,m);
            b=b>>1;
            if (b == 0) break;
            aa=mul_mod(aa,aa,m);
      }
      TESTING(StopTime(8););
      return r;
}

/* returns true if n is prime */

int is_prime(int n)
{
      int r,i;
      int rnsr = 1;
      TESTING(StartTime(7,"is_prime"););
      if ((n % 2) == 0) return 0;
      
      r=(int)(sqrt(n));
      for(i=3;i<=r&&rnsr==1;i+=2) if ((n % i) == 0) rnsr = 0;
      TESTING(StopTime(7););
      return rnsr;
}

/* returns the prime number immediatly after n */

int next_prime(int n)
{
      TESTING(StartTime(6,"next_prime"););
      do {
            n++;
      } while (!is_prime(n));
      TESTING(StopTime(6););
      return n;
}

/* finds out nine digits of pi starting from n */

/* will probably be the main function in openmosix capable version*/

int pseudo_main(int n)
{
      int av,a,vmax,N,num,den,k,kq,kq2,t,v,s,i;
      double sum;
      
//      double start_time, elap_time;
      TESTING(StartTime(0,"pseudo_main"););
//      start_time = (double) clock ();
      N=(int)((n+20)*log(10)/log(2));
      
      sum=0;
      
      TESTING(StartTime(4,"a"););
      for(a=3;a<=(2*N);a=next_prime(a)) {
            
      TESTING(StartTime(1,"log(2*N)/log(a)"););
            vmax=(int)(log(2*N)/log(a));
            av=1;
            for(i=0;i<vmax;i++) av=av*a;
      TESTING(StopTime(1););
            
            s=0;
            num=1;
            den=1;
            v=0;
            kq=1;
            kq2=1;
            
      TESTING(StartTime(9,"k2"););
            for(k=1;k<=N;k++) {
                  
      TESTING(StartTime(3,"k"););
                  t=k;
                  if (kq >= a) {
                        do {
                              t=t/a;
                              v--;
                        } while ((t % a) == 0);
                        kq=0;
                  }
                  kq++;
                  num=mul_mod(num,t,av);
                  
                  t=(2*k-1);
                  if (kq2 >= a) {
                        if (kq2 == a) {
                              do {
                                    t=t/a;
                                    v++;
                              } while ((t % a) == 0);
                        }
                        kq2-=a;
                  }
                  den=mul_mod(den,t,av);
                  kq2+=2;
                  
                  if (v > 0) {
                        t=inv_mod(den,av);
                        t=mul_mod(t,num,av);
                        t=mul_mod(t,k,av);
                        for(i=v;i<vmax;i++) t=mul_mod(t,a,av);
                        s+=t;
                        if (s>=av) s-=av;
                  }
                  
      TESTING(StopTime(3););
            }
      TESTING(StopTime(9););

      TESTING(StartTime(5,"pow_mod;mul_mod"););
            t=pow_mod(10,n-1,av);
            s=mul_mod(s,t,av);
      TESTING(StopTime(5););
      TESTING(StartTime(2,"fmod(sum+(double) s/ (double) av,1.0)"););
            sum=fmod(sum+(double) s/ (double) av,1.0);
      TESTING(StopTime(2););
      }
      TESTING(StopTime(4););
      
      //elap_time = ((double) clock () - (double) start_time) / CLOCKS_PER_SEC;
      TESTING(StopTime(0););
      
      printf("Position: %9d Digits: %09d\n",n,(int)(sum*1e9));
      
      return 0;
}

/* calls up the pseudo_main function to find 9 digits at a time */

int main (int argc,char *argv[])
{
      int position=1,counter=1000; //counter is the digit of pi to start from.
      //  int number;
      //  double start_time, elap_time;
      
      pseudo_main(counter);
      
      DumpTimes();
      //Pause Code
      //system ("pause"); // execute M$-DOS' pause command
      return 0;
      
}

Its the TESTING(StartTime... TESTING(StopTime... that records the moments.

The results suggest some pretty obvious results:

of the 21858061 'ticks' spent in the program (pseudo_main timer) 21858043 is spent in the 'a' loop and, of that time, 13135784 ticks are spent in the 'k' loop. Inside this loops, next_prime, is_prime and pow_mod take almost trivial time.

I'd therefore recommend you try to ease the burden of all teh variable in the pseudo_main function. Break pseudo_main down into multiple calls. This allows the compiler to establish a new set of register variables in each call.

My CodeTime.h header looks like:

/*
      This header defines some functions that allow you to time certain processes
      at the millisecond/microsecond level.

      Use it as follows.

      1.      Somewhere in your code, probably best in a main module, declare the timer
            data with:
            TimerData CodeTimers[NTIMES] = {0};

      2.      Functions you wish to time can be surrounded by StartTime and StopTime calls.
            E.G.
            TESTING(StartTime(4,"ExpandFromFile"););
            length = ExpandFromFile ( &p[i+1], &expanded[j] );
            TESTING(StopTime(4););

            The name will appear in the log so I use the name of the function I am timing
            so I can use the code browser to get there quickly.

      3.      At any time - I usually do it at program terminate - call DumpTimes().
            This will give you a dump of the form:
            Timer[0]      570      131      ShowBasics
            Timer[1]      170      131      ShowNumbers #
            Timer[2]      3110      131      ShowNumbers $
            Timer[3]      2420      12196      ExpandParameter
            Timer[4]      1380      8782      ExpandFromFile
            Timer[5]      0      7      ReadFromFile
            Timer[6]      0      0      
            Timer[7]      0      0      
            Timer[8]      0      0      
            Timer[9]      3950      131      ShowBackground
            Giving Timer number, total time taken (in ms), call count and the supplied name.
            This data is tab delimited so you can copy and paste it into a spreadsheet.


*/

// Timing code to log millisecond times.
#include <sys\TimeB.h>

// Number of timers you can keep.
#define NTIMES      10

#ifdef WIN32
#include "WINDOWS.H"
//#include "WINBASE.H"
#define RecordedTime LONGLONG
#define ElapsedTime LONGLONG
#define fElapsed "%I64d"
#else
// Ansi stuff.
#define RecordedTime struct timeb
#define ElapsedTime long
#define fElapsed "%ld"
#endif

// One timer data structure.
typedef struct
{
      RecordedTime startTime;
      ElapsedTime elapsed;
      long count;
      char * name;
} TimerData;

#ifdef CODETIME

// Please allocate one of these somewhere and make it initially empty.
// E.G. TimerData CodeTimers[NTIMES] = {0};
extern TimerData CodeTimers[NTIMES];

// Call this to start a timer.
#ifdef WIN32
#define StartTime(n,fname) \
{\
      CodeTimers[n].name = fname;\
      CodeTimers[n].count += 1;\
      QueryPerformanceCounter((LARGE_INTEGER*)&CodeTimers[n].startTime);\
}
#else
#define StartTime(n,fname) \
{\
      CodeTimers[n].name = fname;\
      CodeTimers[n].count += 1;\
      ftime(&CodeTimers[n].startTime);\
}
#endif

// Call this to stop the timer.
#ifdef WIN32
#define StopTime(n) \
{\
      RecordedTime stopTime;\
      QueryPerformanceCounter((LARGE_INTEGER*)&stopTime);\
      CodeTimers[n].elapsed += stopTime - CodeTimers[n].startTime;\
}
#else
#define StopTime(n) \
{\
      struct timeb stopTime;\
      ftime(&stopTime);\
      CodeTimers[n].elapsed += ((stopTime.time - CodeTimers[n].startTime.time) * 1000L) + ((long)stopTime.millitm - CodeTimers[n].startTime.millitm);\
}
#endif

// Call this on exit or whenever you want the times logged.
#define DumpTimes()\
{\
      int i;\
      RecordedTime freq;\
      QueryPerformanceFrequency((LARGE_INTEGER*)&freq);\
      TFPRINT(("Timer\tTime\tCount\tMean(Secs)\tName"));\
      for ( i = 0; i < NTIMES; i++ )\
      {\
      TFPRINT(("%d\t"fElapsed"\t%ld\t%f\t%s", i, CodeTimers[i].elapsed, CodeTimers[i].count, CodeTimers[i].count > 0 ? (double)CodeTimers[i].elapsed / (double)CodeTimers[i].count / freq: 0.0,CodeTimers[i].name == NULL ? "": CodeTimers[i].name ));\
      }\
}

#else
// Call this to start a timer.
#define StartTime(n,fname)
// Call this to stop the timer.
#define StopTime(n)
// Call this on exit or whenever you want the times logged.
#define DumpTimes()

#endif

Note that my enhancements have upped its resolution beyiond milliseconds under a WIN32 environment. TFPRINT Is just like printf but with two ((.

Good luck.

Paul
0
 

Author Comment

by:EvilGuru
ID: 13913233
That is amazing! Do you think there is much room for improvements in the k loops, and this may sound a tad noobie but would you be able to explain how splitting up the pseudo_main() call into smaller parts would help make it faster? Sorry, this is the first time I have needed to optimise my code :) Thanks for any help/advice you can give.
0
 
LVL 22

Expert Comment

by:grg99
ID: 13916752
There's a big problem with using QueryPerformanceTimer()-- it has a LOT of overhead per call.    
Try calling it twice in sucession and print out the time difference.  It's probably around 1500 clock cycles.

So it's not very useful for timing short instruction sequences.   You can make it a bit better by subtracting the overhead time from the printouts.

As a better alternative, you might try using the RDTSC instruction, which is at least 40 times faster, but still awfully slow, around 45 clocks.

0
 
LVL 16

Expert Comment

by:PaulCaswell
ID: 13916801
Greg,

Good points! I'll investigate these. I only want the figures relative so it QueryPerformanceTimer is nearly what I want. My earlier version that used ftime just wasnt up to it. It was recording millions of calls taking zero time because the calls themselves took less than 1ms.

EvilGuru,

Splitting can make a heap of difference. The aim shuld be to reduce the number of local stack variables. The compiler can then more efficiently make use of registers and instruction cacheing. Remember also that there is  stack overhead so dont go too far. I'd suggest you put the 'a' loop in one function and the 'k' loop in another, then see if you've made a difference.

Paul
0
 
LVL 3

Expert Comment

by:joju
ID: 13916911
I suspect fmod().
solution some fast fmod func ?
0
 

Author Comment

by:EvilGuru
ID: 13919551
PaulCaswell, you mean something like:

pseudo_main {
int a(vars here);
}
a {
a code here
int k(vars here);
}
Or have I got the wrong end of the stick?

According to the timer fmod takes up very little time, so I while I did think it was that I now dont. But a faster one cant hurt.
0
 
LVL 16

Expert Comment

by:PaulCaswell
ID: 13921202
EvilGuru,

This is the sort of thing I was suggesting. I've only pulled the a_loop code out but I am sure you will understand what I mean. I hope you can pull the k_loop out yourself.

// The 'a' part of the loop.
void a_loop ( int N, int n, int a, double * sum )
{
      int av,vmax,num,den,k,kq,kq2,t,v,s,i;
      vmax=(int)(log(2*N)/log(a));
      av=1;
      for(i=0;i<vmax;i++) av=av*a;
      
      s=0;
      num=1;
      den=1;
      v=0;
      kq=1;
      kq2=1;
      
      for(k=1;k<=N;k++) {
            
            t=k;
            if (kq >= a) {
                  do {
                        t=t/a;
                        v--;
                  } while ((t % a) == 0);
                  kq=0;
            }
            kq++;
            num=mul_mod(num,t,av);
            
            t=(2*k-1);
            if (kq2 >= a) {
                  if (kq2 == a) {
                        do {
                              t=t/a;
                              v++;
                        } while ((t % a) == 0);
                  }
                  kq2-=a;
            }
            den=mul_mod(den,t,av);
            kq2+=2;
            
            if (v > 0) {
                  t=inv_mod(den,av);
                  t=mul_mod(t,num,av);
                  t=mul_mod(t,k,av);
                  for(i=v;i<vmax;i++) t=mul_mod(t,a,av);
                  s+=t;
                  if (s>=av) s-=av;
            }
            
      }
      
      t=pow_mod(10,n-1,av);
      s=mul_mod(s,t,av);
      *sum=fmod(*sum+(double) s/ (double) av,1.0);
}

/* finds out nine digits of pi starting from n */

/* will probably be the main function in openmosix capable version*/

int pseudo_main(int n)
{
      int a,N;
      double sum;
      
//      double start_time, elap_time;
      TESTING(StartTime(0,"pseudo_main"););
//      start_time = (double) clock ();
      N=(int)((n+20)*log(10)/log(2));
      
      sum=0;
      
      TESTING(StartTime(4,"a"););
      for(a=3;a<=(2*N);a=next_prime(a)) {
            a_loop ( N, n, a, &sum );
      }
      TESTING(StopTime(4););
      
      //elap_time = ((double) clock () - (double) start_time) / CLOCKS_PER_SEC;
      TESTING(StopTime(0););
      
      printf("Position: %9d Digits: %09d\n",n,(int)(sum*1e9));
      
      return 0;
}

0
 

Author Comment

by:EvilGuru
ID: 13921970
After a good bit of playing around with it I replaced:
#ifdef HAS_LONG_LONG
#define mul_mod(a,b,m) (( (long long) (a) * (long long) (b) ) % (m))
#else
#define mul_mod(a,b,m) fmod( (double) a * (double) b, m)
#endif
with:
#define mul_mod(a,b,m) (( (__int64) (a) * (__int64) (b) ) % (m))

My time for 10,000 went down from 85 seconds to 42 seconds! How bigger benefit do you think I will get from splitting up the functions and are there any other causes where you think something like __int64 will help?
0
 
LVL 16

Accepted Solution

by:
PaulCaswell earned 680 total points
ID: 13924807
Awesome! A 50% improvement! You probably wont get too much more.

Function splitting CAN do amazing things. Imagine the difference between 't' stored in ram on the stack and 't' in a register! You are limited to a certain number of registers, if you can break down the code into chunks small enough to ensure that ALL calculations work with register variables then you will see astonishing improvements.

Much of the optimisation will involve trial and error. I notice that:

#define mul_mod(a,b,m) (( (__int64) (a) * (__int64) (b) ) % (m))

involves a multiply and a divide. Have you tried carrying the result as a fraction? I.e. quotient and remainder. You can often get quite good improvements by reducing the number of divisions.

Another idea! I notice:

               do {
                    t=t/a;
                    v--;
               } while ((t % a) == 0);

and

               do {
                    t=t/a;
                    v++;
               } while ((t % a) == 0);

Could this be changed to:

v -= countFactors(&t,a);

and

v += countFactors(&t,a);

You could then write countFactors as:

int countFactors(int * tp, int a)
{
int count = 0;
register int t = *tp;
do {
 t/=a;
 count++;
} while ((t % a) == 0);
*tp = t;
return count;
}

I bet the compiler would be able to optimise this big-time. It may not help but its the essence of optimising! Either prove where the time is going or tighten up the code.

Paul

P.S. You are hoping to distribute this process! Can you assume that __int64 is as efficient on all machines as it is on yours? Perhaps fmod would be better on some?
0
 

Author Comment

by:EvilGuru
ID: 13927002
The program itself will come with a .cfg file where I will have an option saying:
Use __int64 = yes;
However on the machines I have tried it on the speed increase has been quite large, but I will give the option to swtich.
You ahve helped so much! Thanks, answer accepted.
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Summary: This tutorial covers some basics of pointer, pointer arithmetic and function pointer. What is a pointer: A pointer is a variable which holds an address. This address might be address of another variable/address of devices/address of fu…
Examines three attack vectors, specifically, the different types of malware used in malicious attacks, web application attacks, and finally, network based attacks.  Concludes by examining the means of securing and protecting critical systems and inf…
The goal of this video is to provide viewers with basic examples to understand and use structures in the C programming language.
Video by: Grant
The goal of this video is to provide viewers with basic examples to understand and use nested-loops in the C programming language.
Suggested Courses

810 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question