thanks for the anwer, cookre.
But i'm not sure what you mean in the second paragraph of your answer. ('in particular when large..... due to normalization')
Main Topics
Browse All TopicsI need to write a program where i need to do a lot of iterative calculations using very small numbers (up to 10E-50). I need to store thousands of these in arrays. Is this possible using c++(borland c builder) ? Or will the calculations take ages?
This Question has been solved and asker verified All Experts Exchange premium technology solutions are available to subscription members.
Experts Exchange has been collecting answers to technology questions since 1996…3 million and counting! If you have a question, chances are we already have your answer.
If you can't find the exact answer you're looking for, ask our exclusive community of 50,000 experts. You’ll get a personalized answer from a trusted professional.
Thousands of free tech tips, tricks, how-to’s and tutorials are available in our peer reviewed articles section. See for yourself how smart our experts are, no login required.
Access the answers to your technology questions today.
30-day free trial. Register in 60 seconds.
Members of the expert community talk about why the experience at Experts Exchange is different than what you will find anywhere else.

Try it out and discover for yourself.
30-day free trial. Register in 60 seconds.
Join the community of experts here and help other tech pros by answering question in your area of expertise. You can earn FREE access to all Experts Exchange's premium features and resources.
He means that floats and doubles have a restricted range of precision, and that this precision is not storage related but due to a difference in the way humans and computers do math.
Rule of thumb precision:
On a 32 bit platform with a relatively standard compiler, a float will hold up to 7 decimal places of precision.
A double will hold up to 31 places.
Impact of "mantissa normalization":
Humans use base ten, as a result we cannot divide by 3
( 1/ 3 = .3333333, .3333333 + .3333333 + .3333333 = .9999999 <> 1 )
Computers use base 2, which due to quirks handles three without breaking a sweat:
#include<stdio.h>
int main() {
int i; float y, x = 0.0f;
y = 1.0f / 3;
for(i = 0; i < 3; i++) {
x += y;
printf("%15.10f\n",x);
}
}
---------------
0.3333333433
0.6666666865
1.0000000000
But they have this nasty problem with 5's
int main() {
int i; float y, x = 0.0f;
y = 1.0f / 100;
for(i = 0; i < 100; i++) {
x += y;
printf("%15.10f\n",x);
}
}
----------------
0.0099999998
0.0199999996
[... snip ...]
0.9799993634
0.9899993539
0.9999993443
Whoops, better open that savings account and collect those fractions!
Enjoy.
PS: You can demonstrate this effect in the NT4/Win 95 "calc" program simply by adding .01 to itself and hammering the "=" key 98 times. MS Subsequently fixed calc by adding a round-up routine.
-Bill
Normalization is a standard practice in floating format that says that the the high order bit (or nibble, depending on CPU) of the mantissa will be non-zero. This is analogous to our standard scientific notation:
m * 10 ** c
that says a nopn-zero mantissa 'm' will be:
1 <= m < 10
For example, we express:
54321 * 10 ** 0
as:
5.4321 * 10 ** 4
Given a fixed fraction length, normalization will either lose useful digits or gain misleading digits.
When doing addition or subtraction of floats, the mantissa of one of the operands must be shifted left or right until the characteristics aggree (point allignment). If we have, say, 5-digit fractions and characteristics differ by 3, we're left with only 2 meaningful digits.
What this all means for you is that that highly iterative algorithms using floating point tend to lose mantissa accuracy, and they lose it it even faster as characteristic magnittude differences increase.
I'm reminded of an atmospheric modeling program I converted from IBM to Univac mainframes many years ago. There was a full order of magnitude difference between the two because of differences in their respective floating formats, yet, within the context of each box, each result was 'correct'. The poor engineers just hadn't taken the vagaries of finite state aritmetic into account.
That's why whenever precision and accuray are CRITICAL, one doesn't use floating point. Alas, the alternative, arbitrary length integer arithmetic, while precise, is so very much slooooooooower.
yes, nothing can be faster than assembly.. but i was talking about high-level languages.. and with a good c compiler, you can be almost as fast as assembly..
about the precision point.. using very small numbers can lead to unexpected results due to representation errors.. you have to be very careful here..
djek2000:
Re-reading your question I was struck by the word "thousands"
Since floating point arith is an Intel processors weak point, um, is it possible to recast your problem?
Eg: if all your numbers are between 0.001e-50 - 9.999e-50
can't we simply say
int x; // x ranges from 1 to 9999
... calcs
printf("%lf", x * 10e-50);
Thus your operational and storage semantic is integer, but presentational is double.
The clear advantage is a shift to (smaller) int datatypes with immensely faster arithmetic.
-Bill
I was more concerned about the 'weak' part, and thought back on all the ill-informed bad press over the so-called Pentium FP flaw a few years back. The great eye-opener for me was how many otherwise experienced programmers were actually concerned about it.
I guess that comes from the presumption that anyone using floating point would be familiar with its' advantages and disadvantages and make an informed choice. I would hazard that if you were to quiz everyone who's been programming for 5 or more years, regardless of CPU or language, that fewer than half would have a sound working knowledge of FP and fewer than 10% would really understand it (for example, can't explain characteristic biasing or mantissa normalization).
These values could vary slightly depending on hardware, but these are typical for double precision floating point (from float.h):
Approximate number of decimal digits: 15
Smallest value you can add to one that makes any difference: 2.2204460492503131e-016 (this is called the "epsilon")
Largest value that can be handled: 1.7976931348623158e+308
Smallest value that can be handled: 2.2250738585072014e-308
This should give you a clearer idea of how precision affects your calculations.
(I will use funny notation here to try to keep things lining up)
500000000000000 +
000000000000001 =
500000000000001 (correct)
(note 15 digits above)
BUT, say your number is more than 15, say 20 digits:
50000000000000000000 +
00000000000000000001 =
50000000000000000000
^^^^ chopped off
See? It chops off the value at approximately the 15th digit.
BUT, the following would work:
50000000000000000000 +
00000002000000000000 =
50000002000000000000 (correct)
This is the meaning of "normalization". It slides the result to the left, until the highest bit is set. In fact, the highest bit is not stored, it is always a one (a special encoding stores a zero value).
Normalization also means sliding less significant bits off the right, causing the "chop off" described above.
A good way to visualize what is happening is that it gets the exact right answer, but it only keeps the first 15 digits of the answer. And there is some "granularity" that could make the lowest digits to be off a bit.
I am leaving off a lot of details, including the fact that base 2 values, and the "e+" is in powers of two, sign, exponent biasing, etc. Way off topic from your question.
Business Accounts
Answer for Membership
by: cookrePosted on 2003-09-02 at 11:16:30ID: 9268999
Presuming you're using double or long double, the compiler will generate hardware floating point code. As long as you don't mix types within the most heavily iterated code, the generated code will be practically as good as assembly.
I trust you're familiar with the problems with floating point in highly iterative algorithms - in particular when large characteristic differences cause loss of mantissa precision due to normalization.