Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17


Normal Distribution

Posted on 2004-04-01
Medium Priority
Last Modified: 2012-08-13

I'm a programmer who's in the process of converting a tool built in Excel into a C# program.  I've managed to port everything except for the NORMDIST() function, which is scaring the hell out of me.

I need to work out the cumulative normal distribution given x, the mean and the standard deviation.  I posed this question in one of the C# question areas and got what I thought was an answer to my question:

But it turned out not to work, for instance:

x = 2.01490302054226
mean = 3.24088418968677
std dev = 0.427899387582919

Using NORMDIST(x,mean,stddev,TRUE) Excel gives me: 0.00208435088016279
The C# function gives me: 556735840.808829

Sometimes the C# function is spot on, sometimes it's way off like above.

I don't need anyone to write the C# function for me (although it would be nice if you did), I just need it explaining in english how I go about calculating it using basic maths.

If anyone can help, or point me in the right direction?

Question by:littlecharva
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
  • 2
  • +2

Expert Comment

ID: 10731809
Look up NORMDIST in the Excel help.  It looks like the formula given in the help is somewhat different from the one computed in your C# function.  Specifically, the value that the exponent is raised to.  According to this formula, the exponent should be raised to:

-pow((x-u),2) / (2*pow(deviation,2))

the remainder of your formula looks fine.

Expert Comment

ID: 10731912

Please disregard my last comment, that is only for the formula where cumulative - FALSE.  I'll see if I can find anything else.

Expert Comment

ID: 10732864
The formula
  Math.Exp(-(Math.Pow((x - mean)/deviation, 2)/2))/Math.Sqrt(2*Math.PI)/deviation;
is perfectly correct for the normal distribution, but it gives the probability mass function.  That is, it behaves like NORMDIST with the 4th parameter hardcoded to FALSE.  We are talking about the bell-shaped curve that is shown in many places when we have to do with random events.

The cumulative function is what you obtain when you integrate the mass function from
-infinity to x.  That is, you calculate the area under the bell-shaped curve that is on the left of x.  This function is not available in analytical form.  Its values can only be obtained numerically...

I don't know how Excel calculates it.  I probably interpolates a table.  Fortunately this function is very well behaved: it starts very flat for far left of the mean (i.e. a few times the deviation; grows smoothly and flattens up again for x far on the right of the mean.  It is symmetrical with respect to the mean, where it reaches the value 0.5 .

It is therefore reasonable to use a table and interpolate it linearly or with a second order polynom.

Let's call t = (x - mean)/deviation and G(t) the cumulative function.  Here are some values:

G(0) = 0.5
G(0.1) = 0.539827896
G(0.2) = 0.579259687
G(0.3) = 0.617911357

If you estimate very crudely G(0.2)est = (G(0.3) + G(0.1))/2 = 0.578869627
you see that the estimate differs from the correct value only by 0.07% That is SEVEN PARTS OUT OF 10,000!

A "curvier" area is around t = 1:
G(0.5) = 0.691462467
G(1.0) = 0.84134474
G(1.5) = 0.933192771
This time the rough interpolation G(1.0)est = (G(0.5) + G(1.5))/2 gives an accuracy of about 3.5%.

How big a table you need to build really depends on the accuracy you need to have.

A linear interpolation (better than the rough one) is done by finding the equation of the straight line between the two points:
  A*t + B = f
You impose that it passes through G(0.5) and G(1.5):
  A*0.5 + B = 0.691462467
  A*1.5 + B = 0.933192771
and resolving for A and B you get:
  f = 0.241730304*t + 0.570597315
valid for 0.5 < t < 1.5

You have a better interpolation if you approximate the curve with a parabola:
  f=A*t^2 + B*t + C
instead of a straight line.  To do this you need three consecutive points (e.g. 0.5, 1.0, and 1.5):
  A*0.25 + B*0.5 + C = 0.691462467
  A*1.00 + B*1.0 + C = 0.933192771
  A*2.25 + B*1.5 + C = 0.691462467

I am not going to find the coefficients here, but it is straightforward to obtain values for A, B, and C.

You should build a grid of function values at intervals D of the variable t and pre-calculate A, B, and C for each triplet of points.  D should be small enough to give you enough accuracy (I would recommend that you use constant Ds instead of trying to optimise the density of points, because you would waste time determining what triplet to use).

The "curviest" part of the cumulative function is where its second derivative is largest, but that is the first derivative of the mass function and can be calculated.  If I didn't make mistakes (!) this should happen at  |t| = 3^(1/4) = 1.316 . Therefore, you should check your accuracy there.

When you estimate a set of A/B/C from a triplet T - D, T, and T + D, use them for
  T - D/2 < t <= T + D/2
to have the best results.

Build the table to cover the interval -5 < t < +5 and use some very coarse approximation outside that.  For t = -5 the cumulative function is about 3*10^ -7.  If you go to +/- 6 you will have 9 significant digits guaranteed and probably can just set the value to 0 outside that.

Only make a table for 1/2 of the curve: -6 < t < 0 and then calculate the other half with the formula f(t) = 1 - f(-t)
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

LVL 84

Expert Comment

ID: 10733162
#include <stdlib.h>
float NORMDIST(float mean,float stddev){
    static int iset=0;
    static float gset;
    float fac,rsq,v1,v2;
    if( iset==0 ){
            v1=2.0*rand()/RAND_MAX - 1.0;
            v2=2.0*rand()/RAND_MAX - 1.0;
            rsq = v1*v1+v2*v2;
       }while( rsq >= 1.0 || rsq == 0.0 );
        fac = sqrt(-2.0*log(rsq)/rsq);
        iset = 0;
        return mean+stddev*v2*fac;
        return mean+stddev*gset;
LVL 84

Accepted Solution

ozo earned 1000 total points
ID: 10737443
Sorry, I misunderstood. I thought NORMDIST was supposed to return a normally distributed random variable

This version should give the integral of the normal distribution, which is what I think you wanted

#include <math.h>
float NORMDIST(float x,float mean,float stddev,int cumulative){
    x = (x-mean)/stddev;
    if( cumulative ){
        float t,z,ans;
        z = fabs(x)/sqrt(2.0);
        t = 1.0/(1.0+0.5*z);
        ans = t*exp(-z*z-1.26551223+t*(1.00002368+t*(0.37409196+t*(0.09678418+
        return x <= 0.0 ? ans : 1.0-ans;
        return exp(-x*x/2.0)/sqrt(2.0*3.14159265358979);

Author Comment

ID: 10739229
Cheers ozo, you get the points for making it the easiest for me, but thanks to everyone else for their assistance.


Expert Comment

ID: 10739244
You say you have a working formula that, when gives you back values between 0-1 matches the values in Excel, but it sometimes gives you values out of the range [0,1]... ok, first, the values have to be between 0-1, because what you are looking for is that...

Probabiliy of getting a value less than X in an experiment witch result are distribuited as a gaussian bell centered in "mean" value (the most usual experiment result) and of a "width" (the bell) of "stddev" (how far result use to be from the "mena" result).

...and this probabilities are always coded from 0 (imposible) to 1 (for sure).

So, if your formula is working great for some numbers "near" the center of the bell (the "mean") but worst for other values not so near, then you have to increase the number of iterartions from 16 to 32 or more (it will not be noticed), then give it another try on that numbers. If it worked for some number, should do for others.

If you don't want to play with 16 or 32, you could also make if undefined: Do all iterations needen UNTIL result is between 0,1 (inclusive) and difference with last iteration result is less than 0.0000001, this approach will give you a more reliable formula, but it is a work that can be replaced be using 999 iterations instead of 16!!!

Ofcurse it may not work, but it is very easy to try and I would bet for it!

Expert Comment

ID: 10739249
Opps! I didn't read ozo before writting... great job, ozo!!

Expert Comment

ID: 10739297
Funny. After all the effort of reasoning about interpolating tables, I just found a function which is very similar to ozo's.  You will find the C code in

It only has 5th powers of t. It will teherfore require less computing.  You might like to have a look at it.

Featured Post

New feature and membership benefit!

New feature! Upgrade and increase expert visibility of your issues with Priority Questions.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Foreword (May 2015) This web page has appeared at Google.  It's definitely worth considering! How to Know You are Making a Difference at EE In August, 2013, one …
Article by: Nicole
This is a research brief on the potential colonization of humans on Mars.
Although Jacob Bernoulli (1654-1705) has been credited as the creator of "Binomial Distribution Table", Gottfried Leibniz (1646-1716) did his dissertation on the subject in 1666; Leibniz you may recall is the co-inventor of "Calculus" and beat Isaac…
Finds all prime numbers in a range requested and places them in a public primes() array. I've demostrated a template size of 30 (2 * 3 * 5) but larger templates can be built such 210  (2 * 3 * 5 * 7) or 2310  (2 * 3 * 5 * 7 * 11). The larger templa…
Suggested Courses

722 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question