Solved

# Sums the series

Posted on 2000-04-08
Medium Priority
357 Views
I wrote two programs that sums the series 1 (1/2) + (1/2) +... + (1/2)
in left-to-right order, and in right-to-left order.  Are the results the same or different?
Explain why.

My programs:

First Program

#include <iostream.h>

main(){
double Sum = 0;
long double Denominator = 1.2676506E30;
double Numerator = 1;
int ValuesProcessed = 0;
double Fraction;
while (ValuesProcessed < 100)
{
Fraction = Numerator / Denominator;
Sum += Fraction;
Denominator /= 2;
++ValuesProcessed;
}
cout << "Sum is: " << Sum << endl;
cout << "Values is: " << ValuesProcessed << endl;
return 0;
}

Second program

#include <iostream.h>

main(){
double Sum = 0;
long double Denominator = 1;
double Numerator = 1;
int ValuesProcessed = 0;
double Fraction;
while (ValuesProcessed < 100)
{
Fraction = Numerator / Denominator;
Sum += Fraction;
Denominator *= 2;
++ValuesProcessed;
}
cout << "Sum is: " << Sum << endl;
cout << "Values is: " << ValuesProcessed << endl;
return 0;
}

Thanks,

Jairo Cardenas
Colombia
0
Question by:jcardenas
[X]
###### Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

• Help others & share knowledge
• Earn cash & points
• 9
• 4
• 3
• +2

LVL 1

Expert Comment

ID: 2696442
of course it should be the same 'cause it's a sum in a sum the order doesn't mattar

first
2^100=1267650600228229401496703205376

second your func's goes only till 2^99
i mean 1/2^99

so in the first you do
1/2^100+1/2^99+...+1/2
and in the second you do
1+1/2+...+1/2^99

and in the first you are wrong about
the Denominator cause 2^100=1267650600228229401496703205376
and you rounded it
to:1267650600000000000000000000000
0

LVL 22

Expert Comment

ID: 2696450
I assume this is an accademic question.  We cannot provide answers to accademic questions.  That is unethical.  However we can give you some limited assistance, like you might get from your teacher.

I can tell you this much, according to the rules of mathematics, there should be no difference in the results, because as ntdragon said, you can sum numbers in any oorder and get the same results (communitive property of addition).  However when you do this on a computer you will definitely get a difference.  There is a good reason for this.  (good int he sense that it makes sense for this to happen, bad in the sense that it would be best if it didn't happen.)   Think about it.
0

LVL 1

Expert Comment

ID: 2696616
nietod
jcardenas has a bug in his prog
and i helped him to find it
now he have to fix it
0

LVL 22

Expert Comment

ID: 2696708
This is an accademic question.  The person who created the question knew the answer to it.

Try the program yourself (with the fix).  You will find that the results will not be the same.
0

LVL 3

Expert Comment

ID: 2697054
Since the mantissas are only 63 bits for long doubles, the differences will  start occurring when you get to the 63rd term of the sum.  I'm assuming here that everything is done in assembly code AND not by the compiler.
For example, 1/2^100 has a mantissa of all zeroes.  BUT, when you add 1/2^99 then the mantissa looks like
1000.... and the pattern continues (add the next term in the sum and add an additional 1).  You run out after 63 sums.
SO, since errors are introduced, forget about doing any coding, why would one expect the sums to be the same?
On the other hand, if you programmed the Intel coprocessor yourself, and, as ntdragon correctly pointed out, if the sums were actually trying to compute the exact same sum, you would get identical results.  I have to stress that I'm assuming you don't let the compiler generate the asm code, you do it yourself!
Glenn
0

LVL 22

Expert Comment

ID: 2697587
>>  if you programmed the Intel coprocessor yourself,
>> and, as ntdragon correctly pointed out, if the sums
>> were actually trying to compute the exact same sum,
>> you would get identical results
No you would not.

There is no difference between writting floating point assembly directly or having the compiler write it for you.  (Well there is one difference--Its easier to have the compiler do it.).

Guys, pay attention.

A) this is an accademic question.  The person that invented the situation already knew the answer.   That means we can help, but wecan't give away the answer.

B) This is not some sort of bug or compiler inadequacy.  This is a fundamental issue of floating point calculations.
0

Author Comment

ID: 2697900
Ok, friends thanks for your help, is a good job locate any bug in my programs, please i need your help to fix my programs. Later will use this code into the mathematical application.

Thanks,

Jairo
Colombia
0

LVL 22

Expert Comment

ID: 2697933
There is a bug, ntdragon described it.  But fixing that is not enough to get the two algorithms to agree.

What was the assignment you were given?  What was the material you've been studying on?
0

Author Comment

ID: 2698381
Sorry, this is not a assignment, this is not study material, this is a sample code to test some calculations. We are working with some local mathematical application to help our geological group to manage the geological samples; and somebody asked the question over sum of series of big numbers left to right and right to left; i am novice with C++ and donÂ´t have the clear answer.

This is a Expert Group, and we are looking for help not for Police investigation or academic trial.

Thanks,

Jairo Cardenas
Republic of Colombia
www.ecopetrol.com.co
0

LVL 84

Expert Comment

ID: 2698510
summing from the smallest absolute values first will generally be more accurate, but if you care about the accuracy of your result, you should also analyse the numerical stability of your algorithms with an understanding of the error characteristics of the numbers on your machine.
0

LVL 22

Expert Comment

ID: 2698533
Sorry for the mixup, but this is a frquently asked accademic question.  (It occurs in text books etc.).

The answer is that floating point numbers have very limited precision, a typical double will store about 21 decimal digits of precision. (In this case "decimal "means base 10, not "decimal" as in after the decimal point. in a floating point number, the number stores the most significatt digit it needs to and about 21 digits after that, so if a number is very large, like 1234567890.0 there are 10 digits before the decimal point so it can store only 21 digits after that first digt, so it can store only about 11 digits after the decimal point.  but in a small number, like 0.000000001234567890 it can store about 21 digits after that first 1, so it can store many more decimal digits in this case.   In other words, it can store about 21 decimal digits starrting from the must significant digit it has to store.   This means that when the number it is storing is small, it is storeing more precise number than when the number is big.

Okay?

Well if you start adding with the small numbers first (like !/2)^100 which is very small you will be working with small numbers that have much greater absolute precission (they have more digits behind the decimal point).  as you add these numbers you will get bigger and bigger values and start loosing absolute precission (number of digits after the decimal point)  However the contribution of this final terms (small numbers) is not completely lost, it will just be rounded a little.  So the result is a little rouinded or a little innacurate.  But he result is not seriously incorrect.

However if you add startng with the small numbers you get much worse results  In this case you start out with big numbers that have lower absolute precission.  When you stat reaching the smaller numbers the last dgit stored of the total is actually more than the first digit stored on the small number, so the small number is just rounded to 0.  So what happens is that most of the smaller numbers in the series just get ignored, because they are too small to be added into the larger total.
0

LVL 3

Expert Comment

ID: 2698673
Just because a number, in base 10, requires 18+ digits does NOT mean it can't be represented in the FPU exactly!
Two simple examples are
1.  0.1 (base 10) can't be stored exactly but yet only needs 1 digit to represent it.
2. 2^100 requires 31 digits (base 10) but yet CAN be represented EXACTLY.
That was my earlier point.  ALL the numbers he wants to sum are represented exactly and, although the eventual answers are identical, it is still off.  BUT one has to note the very particular sum he is summing - change 2 to 5 and everything is off.
Glenn

0

LVL 22

Expert Comment

ID: 2698764
>> does NOT mean it can't be represented in
>> the FPU exactly!
??? I think you mean it "can" not "can't", right?   If so, I would agree, but that still is not what he is detecting.  This is not a rounding error in individual terms.

I'm not following you.

The problem occurs because FP is unable to add large magnitude numbers to small magnitude numbers accurately.   When the difference in magnitude exceeds about 20 orders, the result simply is the large number unchanged.  That is the source of the problem.
0

LVL 3

Expert Comment

ID: 2698893
That wasn't a typo - it is can't.  Like I said, 2^100, eventhough is 31 digits long, is represented is the FPU EXACTLY.
Glenn
0

LVL 22

Expert Comment

ID: 2699879
Right. I agree that some numbers can be represented exactly.  I agree that some numbers can't be represented exactly.

But no number that has more than about 21 orders of magnitude between its least significatn and most significant digit can be represented exactly.
This problem is this solution has about 31 orders of magnitiude between the  first and last significant digit.   Using assembly isn't going to change that.

>> 2^100, eventhough is 31 digits long,
But it is only 1 significant digit (in both binary and decimal).
0

Author Comment

ID: 2700286
Friends, this discussion is very academic,excellent and very clear for our group.
>>Using assembly isn't going to change that.
Nietod talk about assembly, my question is: The errors are presented only with hi-level languages as our question??  Or with assembly  exist the same errors???

Regards,

Jairo
0

LVL 22

Expert Comment

ID: 2700358
>> The errors are presented only with hi-level languages
>> as our question??  Or with assembly  exist
>> the same errors?
This is a fundamental issue of floating point numbers.  It has nothing to do with the language.   So it will exist the same in assembly as in C++ or in any other language.  I'm not sure why GlennDean said that, because it clearly is not true and he does not seem to be one to make that sort of mistake.

The only solution is to use FP numbers that have sufficiently large enough precission (which often is not a choice).   Otherwiseo try to structure the mathematical operatiosn to give you the best results.   In the case of sums, you should sort the numbers to be summed according to their absolute value before adding.  This will yeild the most accurate result.

Since you probably have a scientific background, you are probably familiar with the rules for addition (and other operations) for significant digits.  right?  I could have explained it with that in mind.  When you add two numbers you have to round the result to the position of the least significant digit of the number that has the highest least significant digit.  The same is true with FP numbers.  However FP numbers don't vary the number of significant digits they store.  An 80 bit FP numbers stores about  21 significant (decimal) digits.  So when a number is large, its least signficant digit is also large, (about 21 digits down from the most dignificant digit.)  So when you add a large number and a small number, the rounding occurs at the most least digit of the large number, which can actually be BEFORE (bigger) than the most significant digit of the litle number.  It is like adding

1.0*10^1 + 1.0*10^-2

The result is still 1.0*!0^1.

0

Author Comment

ID: 2700876
Ok, our solution is to use mathematical operations, with the enough precission.
right??

Jairo

0

LVL 22

Accepted Solution

nietod earned 150 total points
ID: 2700933
>> Ok, our solution is to use mathematical
>> operations, with the enough precission.
That woudl be ideal, but you probably don't have the option.  Most PCs have only 80 bit floating point support, which is not enough for this problem.  Mainframes may have 128 or even 256 bit floating point numbers, which would be enough for this case--although not for all cases.

The only other choice is to not use FP.  There are ways of repredenting numbers that can potentially avoid this problem, like BCD, but they require much more work and tend to be substancially slower.

Assuming you are limited to 80 bit FP (or any FP) your best bet is to insure the math is done by starting with the numbers with the lost absolute value and proceeding to the highest.  Your one algorithm does that automatically.  The other does the opposite, which is the worst thing you can do.
0

## Featured Post

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article's goal is to present you with an easy to use XML wrapper for C++ and also present some interesting techniques that you might use with MS C++. The reason I built this class is to ease the pain of using XML files with C++, since there isâ€¦
Often, when implementing a feature, you won't know how certain events should be handled at the point where they occur and you'd rather defer to the user of your function or class. For example, a XML parser will extract a tag from the source code, whâ€¦
The viewer will learn how to pass data into a function in C++. This is one step further in using functions. Instead of only printing text onto the console, the function will be able to perform calculations with argumentents given by the user.
The viewer will learn additional member functions of the vector class. Specifically, the capacity and swap member functions will be introduced.
###### Suggested Courses
Course of the Month15 days, 1 hour left to enroll