Solved

PI algorithm

Posted on 1998-10-18
20
530 Views
Last Modified: 2010-04-02
I have been working on a PI algorithm that I got off the internet a long time ago.  I am not sure what is called or anything, but I have a few questions.  First, here is the code (a little MFC):

// Global variables (this WILL BE REDESIGNED).
const long d1 = 239, d = 25;
long m; // m = m_NumDigits + 2
long i2, i3;  // These variables are busy, but not like th ones shown below:
long i, r, v; // These variables are all very busy all the time...

// Other Functions:
void div_acst (char *&, int &);
void a_is_afromp (char *&, char *&);

// ....

  char *a = (char *) malloc (sizeof (char) * (unsigned long) 1000000);
  char *p = (char *) malloc (sizeof (char) * (unsigned long) 1000000);
  char *t = (char *) malloc (sizeof (char) * (unsigned long) 1000000);
  int divisor;
  long counter;
  time_t orgtime;
  time (&orgtime);

  if (a == NULL || p == NULL || t == NULL)
  { MessageBox ("Not enough memory :-(");
    return;
  }

  UpdateData (TRUE);

  // Reduce priority of this to idle time:
  AfxGetApp ()->SetThreadPriority (THREAD_PRIORITY_IDLE);

  // Minimize the window:
  POINT pt;
  GetCursorPos (&pt);
  SendMessage (WM_SYSCOMMAND, SC_MINIMIZE, pt.y << 16 | pt.x);
  AfxGetApp ()->m_pMainWnd->SendMessage (WM_SYSCOMMAND, SC_MINIMIZE, pt.y << 16 | pt.x);

  m_Results = "";
  CWaitCursor cursor;

  m = m_NumDigits + 2;

  i2 = (3 * m / 8) * 4 + 3;
  i3 = (i2 / 14) * 4 + 3;

  /* COMPUTE PROCEDURE */
  i = i2;

  /* DIV32IA */
  a[0] = (char) (3 / i);
  r = 3 % i;
  v = r * 10 + 2;
  a[1] = (char) (v / i);

  r = v % i;
  for (counter = 2; counter <= m; counter++)
  { v = r * 10;
    a[counter] = (char) (v / i);
    r = v % i;
  }

  /* DIVAD */
  divisor = d;
  div_acst (a, divisor);

  for (i = i2 - 2; i >= 3; i -= 2)
  { /* DIV32IP */
    p[0] = (char) (3 / i);
    r = 3 % i;
    v = r * 10 + 2;
    p[1] = (char) (v / i);
    r = v % i;
    for (counter = 2; counter <= m; counter++)
    { v = r * 10;
      p[counter] = (char) (v / i);
      r = v % i;
    }

    /* SUBA */
    a_is_afromp (a, p);

    /* DIVAD */
    div_acst (a, divisor);
  }

  /* SUBT32A */
  t[0] = 3;
  t[1] = 1;
  for (counter = 2; counter <= m; counter++)
  { t[counter] = 9 - a[counter]; }

  i = i3;

  /* DIV4IA */
  a[0] = (char) (4 / i);
  r = 4 % i;
  for (counter = 1; counter <= m; counter++)
  { v = r * 10;
    a[counter] = (char) (v / i);
    r = v % i;
  }

  /* DIVAD1 */
  divisor = d1;
  div_acst (a, divisor);

  /* DIVAD1 */
  div_acst (a, divisor);

  for (i = i3 - 2; i >= 3; i -= 2)
  { /* DIV4IP */
    p[0] = (char) (4 / i);
    r = 4 % i;
    for (counter = 1; counter <= m; counter++)
    { v = r * 10;
      p[counter] = (char) (v / i);
      r = v % i;
    }

    /* SUBA */
    a_is_afromp (a, p);

    /* DIVAD1 */
    div_acst (a, divisor);

    /* DIVAD1 */
    div_acst (a, divisor);
  }

  /* SUB4A */
  a[0] = 3;
  for (counter = 1; counter <= m; counter++)
  { a[counter] = 9 - a[counter]; }

  /* DIVAD1 */
  div_acst (a, divisor);

  /* SUBT */
  for (counter = m; counter >= 1; counter--)
  { a[counter] = t[counter] - a[counter];
    if (a[counter] < 0)
    { a[counter] += 10;
      a[counter - 1]++;
    }
  }

  CString temp;
  temp.Format ("pi =\t3.");
  m_Results += temp;
  for (counter = 1; counter <= m_NumDigits; counter++)
  { temp.Format ("%1i", a[counter]);
    m_Results += temp;

    if (counter % 5 == 0)
    { m_Results += " "; }

    if (counter % 50 == 0)
    { temp.Format ("\t%li\r\n\t", counter);
      m_Results += temp;
    }
  }

  time_t currenttime;
  time (¤ttime);
  temp.Format ("Time took in seconds: %lu", currenttime - orgtime);
  m_Results += temp;

  // Get thread running again:
  AfxGetApp ()->SetThreadPriority (THREAD_PRIORITY_NORMAL);

  // Restore the window:
  GetCursorPos (&pt);
  SendMessage (WM_SYSCOMMAND, SC_RESTORE, pt.y << 16 | pt.x);
  AfxGetApp ()->m_pMainWnd->SendMessage (WM_SYSCOMMAND, SC_RESTORE, pt.y << 16 | pt.x);

  UpdateData (FALSE);

  free (a);
  free (p);
  free (t);

void a_is_afromp (char *&a, char *&p)
{ long counter;

  for (counter = m; counter >= 0; counter--)
  { a[counter] = p[counter] - a[counter];
    if (a[counter] < 0)
    { a[counter] += 10;
      a[counter - 1]++;
    }
  }
}

void div_acst (char *&a, int &divisor)
{ long counter;
  r = 0;
  for (counter = 0; counter <= m; counter++)
  { v = r * 10 + a[counter];
    a[counter] = v / divisor;
    r = v % divisor;
  }
}

Questions:
1) How to find out how much memory I need?  I know it is based on the number of digits, but quite frankly I can't figure this algorithm out, hence I am not sure how much memory I need to allocate.  If I take the number of digits and add 5, that is not enough in case that is any indication.

2) Is the algorithm efficient?  How can it be sped up?  The amount of time it takes to calculate more digits seems to go up non-linearly (the curve on the graph gets sharper the more digits you want to calculate).  Note that I am going to redesign the functions so I do NOT need the global variables.  This program was written a VERY long time ago, hence the C architecture and everything else that is poor by today's standards.  That is why I am curious if this algorithm is indeed very efficient.  I have not had any experience with any other PI calculating programs.

If you can answer one of the above questions, but not the other, I will post a separate question worth 50 points.  If you can help with both, I will give you 100.  Thank you!
0
Comment
Question by:gosh
  • 9
  • 6
  • 4
  • +1
20 Comments
 

Author Comment

by:gosh
ID: 1175448
Edited text of question
0
 
LVL 22

Expert Comment

by:nietod
ID: 1175449
>> Note that I am going to redesign the functions so I do NOT need the global variables.
I would recomend you create a class for storing BCD (Binary Coded Decimals (This method of expressing numbers)).  The class should support the arithmetic operations you need.  It should also be written so that the storage space will expand to accomodate the number it needs to store.  

>> How to find out how much memory I need?
what memory are you talking about?  the size of a, p, and t?  That looks like it should be tne number of digits you want to store.  (Seems like it includes room for the initial "3.".)

>> The amount of time it takes to calculate more
>> digits seems to go up non-linearly
There is no way around that.

>> How can it be sped up?
You could consider using the BCD arithmatic instructions for the x86 processors.  This will speed things up by a notecable amount, maybe 20%, but no huge difference.

You could also consider using packed BCD.  The processor supports it and it is faster in some cases.
0
 
LVL 1

Expert Comment

by:Bonev
ID: 1175450
I would suggest hexadecimal arithmetics, not BCD.
1) It would be faster, because you can take advantage of the 32-bit registers.
2) The numbers will be more efficiently stored.

0
 
LVL 22

Expert Comment

by:nietod
ID: 1175451
What is hexadecimal arithmetics?
0
 
LVL 1

Expert Comment

by:Bonev
ID: 1175452
Nietod, I'm sure you know what I am talking about.

All integer types in the computers are binary coded. And four binary digits form a hexadecimal digit. Because the binary numbers are very long (16/32 digits) they are very inconvenient to use. That's why everybody uses the hexadecimal representation of the numbers.
If you use BCD, the numbers don't use the full capacity of the storage space - for example, in one byte (8 bits) you can store numbers up to 99, but in binary form you can store numbers up to 255. In longer multi-byte numbers the ratio is even worse.
The arithmetic operations on BCD (x86) require decimal adjustment. Well, it is not time consuming, but why to use it, when you can skip it.
The arithmetic operations on binary (hex) numbers are very fast because you can even treat a 32-bit number as a single digit in a multi-byte number, thus reducing the number of instructions needed to perform the arithmetic operations.

0
 
LVL 22

Expert Comment

by:nietod
ID: 1175453
Do you just mean multi-byte binary integers?  Like say 64 bit integers?  Those are faster than BCD, but they are a pain to work with.  (A pain to debug because you can't check your work by "looking at them" and you can't punch them into a hex-dec calculator to check either.  With BCD you can look a number and understand it, you can follow through calculations and make sure they are correct.  Also converting to ASCII is easy.  But BCD is slower and, as the number grows in size, it gets slower about 2.5 times faster.)  
0
 
LVL 1

Expert Comment

by:Bonev
ID: 1175454
Yes, but not limited to 64-bit. Of course, for debug purposes you will have to provide hex-dec converter routines. But your idea to create a class for long integers is very good - during the development process you can use BCDs, but for the release you can use binary integers.
0
 

Author Comment

by:gosh
ID: 1175455
Hello everyone, thanks for all the input.  I have a few questions though:

For nietod:
BCD, that is, Binary Coded Decimal?  Is that when you take, say two digits and put them in one byte since all you need is 4 bits to represent one digit?  If so, is there any advantage to doing that other than saving memory?  Memory is no problem for me here...

>> consider using the BCD arithmetic instructions
What are these?  Can you please explain that a little more?  I am not a very knowledgeable programmer.

>> BCD is slower and, as the number grows in size, it gets
>> slower about 2.5 times faster
So should I use BCD or no?

For Bonev:
>> for the release you can use binary integers.
Like I said, I am certainly no professional, can you please elaborate on this a little more?

Both:
Thank you for all of your help.  I remember reading somewhere a LONG time ago about some algorithm that would let you calculate an individual digit of PI, anywhere down the line.  Would either of you happen to know anything about such a project?

Thanks again!
0
 
LVL 22

Expert Comment

by:nietod
ID: 1175456
BCD is a way of using binary numbers to represent decimal numbers.  The most common ways (especially on intel computers) is unpacked BCD where each byte has binary values from 0 to 9.  This is the simplist form of BCD.  The other method is packed BCD where each nibble (half byte) has binary values from 0 to 9.  This allows a pair of digits to be stored in a single byte.  Packed BCD has the advantage that it uses hald as many bytes to store a number.  In some cases, like addition and subtraction especially, it may be faster (up to twice as fast) as unpacked BCD.

>> consider using the BCD arithmetic instructions
>>     What are these?  Can you please explain that a little more?  
>> I am not a very knowledgeable
If you are familiar with Intel assembly, there are instructions for performing BCD arithmatic.  They would speed up your processing slightly--not a huge savings, maybe 10% or so.  If you don't know intel assembly, I doubt it is worth learning for this, unless speed is really an issue.  (Rewritting the core calculations in assembly and using the BCD instructions would probably give you a 50% increase.)  (I did the reverse--went from assembly to C++ and saw about a 100% decrease.)

>> BCD is slower and, as the number grows in size, it gets
>> slower about 2.5 times faster
>>     So should I use BCD or no?
That is what I recomend for large numbers.  It is slower than multi-byte operands as suggested by Bonev.  But it is much easier to debug.  With BCD you can "watch" the numbe change as you debug and you will be able to understand what is going on.  i.e. with BCD you can look at two 1000 digit numbers being added and say, yes, that is the right result.  Not a chance with multi-byte binary numbers.  They are huge binary numbers that can't be converted to decimal in your head or even with a bin-dec calculator.  You can write procedures to do the conversion, but those aren't much good as you are debuging, unless you find a place you suspect, and then add the conversion, recompile and test again.  

>>For Bonev:
>>     >> for the release you can use binary integers.
>>     Like I said, I am certainly no professional, can you please elaborate on this a little more?
That's a good compromise really, but twice as much programming work for you.  He is saying create two classes that can represents huge numbers  (and make them so they can expand to hold a number of any size needed)  one class would use BCD to store the nubmers and would be slower but easier to debug.  The other class would use multi-byte binary and would be faster and harder to debug.  You can use conditional compilation to switch between the two.

>> I remember reading somewhere a LONG time ago about
>> some algorithm that would let you calculate an individual digit of PI,
I doubt that you can calculate any digit without calculating the previous ones (or doing a similar amount of work)  The person to ask about that is ozo.  He knows this stuff.  You might want to post a question with his name on it pointing him here.  He isn't around all the time though.
0
 
LVL 1

Expert Comment

by:Bonev
ID: 1175457
Gosh,
Regarding the long-number arithmetics:
Let's start with the BCD case, because as Nietod mentioned is easy to implement, debug and understand. I'll explain only the addition of two numbers - the other operations use the same idea. Of course, you know how to add two numbers on a piece of paper:
You align the numbers by their last digit. Starting from the last digit, you add the coresponding digits and if the result is less than 10 you write it down directly. If not, you subtract 10, write down the resulting number and remember that you have carried 1 toward the next digit. When you add the next digits you add that 1. Repeat that until you finish with the last digits.
You can implement that pretty straightforward:
#define MAX_DIGIT 10  // how many digits

char a[MAX_DIGIT], b[MAX_DIGIT], c[MAX_DIGIT]; // long numbers
// each element in the array represents a single decimal digit

int carry = 0;
for( int i = MAX_DIGIT-1; i >= 0; i-- )
{
      int res = a[MAX_DIGIT] + b[MAX_DIGIT] + carry;
      c = res % 10; // instead of subtraction we can use modulo
      carry = res / 10;
}

The disadvantage is that this code and number representation is very inefficient.
The packed BCDs version cannot be done using pure C/C++ - it's specific to the x86 family of microprocessors. But I can show you how to get the same result with a minor change to the previous code:

char a[MAX_DIGIT], b[MAX_DIGIT], c[MAX_DIGIT]; // long numbers
// each element in the array represents TWO decimal digits

int carry = 0;
for( int i = MAX_DIGIT-1; i >= 0; i-- )
{
      int res = a[MAX_DIGIT] + b[MAX_DIGIT] + carry;
      c = res % 100; // 100, not 10
      carry = res / 100;
}

The difference is that now we keep two digits in a single byte instead of one. This doubles the speed and reduces the needed memory.
The binary arithmetics is the same - the only difference is that it uses the full capacity of the memory (a single byte can hold values up to 255).


0
Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

 
LVL 22

Expert Comment

by:nietod
ID: 1175458
>>The packed BCDs version cannot be done using pure C/C++ - it's
>>specific to the x86 family of microprocessors
Its not specific to x86.  The Motorola 68000 processors support it (as well as a signed version). And I would suspect most other processors do.

It can be handled using pure C/C++.  It just requires some "bit" manipulations.

For speed you do nopt want to use / and % in the addition and subtraction algorithithms.  You can just compare the result with 9 (or 0 for subtraction) and make adjustments if needed.  That is far faster.
0
 
LVL 1

Expert Comment

by:Bonev
ID: 1175459
>>Its not specific to x86.  
Agree.

>>It can be handled using pure C/C++.  It just requires some "bit" manipulations.
These manipulations will be times slower than the single processor instruction.

>>For speed you do not want to use / and % in the addition and subtraction algorithithms.  
Sure. I did it for illustration purposes and shorter code.

0
 
LVL 22

Expert Comment

by:nietod
ID: 1175460
>> >>It can be handled using pure C/C++.  It just requires some "bit" manipulations.
>>     These manipulations will be times slower than the single processor instruction.
Slower, but not much.  The x86 BCD instructions were given low priority and thus are not heavily optimized and the manipulations that need to be done are relatively minor.  My library usses conditional compilation to do it both way--uses inline assembly on x86 processors and regular C++ on others.  The difference is minor.  This is especially surprising as the inline assembly code is increadibly optimized and the assembly generated by the C++ compiler is far from it.  Had I known, I would not have bothered with the assembly version.
0
 
LVL 1

Expert Comment

by:Bonev
ID: 1175461
It really sounds strange.

0
 
LVL 22

Expert Comment

by:nietod
ID: 1175462
For example, on a pentium the DAA instruction is 4 clocks cycles.  (They didn't optimize it.)  The equivalent code is 8 instructions, all optimized to one clock cycle (can run in as little as 6 clocks in two pipes--I think).  Thus the C++ version takes about 2 twice as long.  The C code should be even slower as it has terrible register usage.  But in practice, the difference between the two is nearly unmeasurable, probably because this is not much of a bottleneck.
0
 

Author Comment

by:gosh
ID: 1175463
Okay, I think all of my problems have been solved.  Since you both helped so much, I will post a separate question (50pts) for someone to answer.  One of you please answer, then I will post the "question" addressed to the other one.  Thanks for all the help!
0
 
LVL 22

Accepted Solution

by:
nietod earned 50 total points
ID: 1175464
Alright, I'll answer this one, so I don't have to go searching...

thanks.
0
 
LVL 84

Expert Comment

by:ozo
ID: 1175465
0
 

Author Comment

by:gosh
ID: 1175466
To Bonev: So as not to clutter up the C++ area, I will post a question for you in the lounge labeled "For Bonev"

It is: http://www.experts-exchange.com/Q.10092105

Thank you both!
0
 
LVL 22

Expert Comment

by:nietod
ID: 1175467
Timing is everything, ozo...  I guess you don't have a lot of time to answer questions.  You must spend all your time reading about this stuff.
0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

Introduction This article is the first in a series of articles about the C/C++ Visual Studio Express debugger.  It provides a quick start guide in using the debugger. Part 2 focuses on additional topics in breakpoints.  Lastly, Part 3 focuses on th…
This article will show you some of the more useful Standard Template Library (STL) algorithms through the use of working examples.  You will learn about how these algorithms fit into the STL architecture, how they work with STL containers, and why t…
The viewer will learn how to user default arguments when defining functions. This method of defining functions will be contrasted with the non-default-argument of defining functions.
The viewer will learn additional member functions of the vector class. Specifically, the capacity and swap member functions will be introduced.

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now