Normal Distribution

Posted on 2004-04-01
Last Modified: 2012-08-13

I'm a programmer who's in the process of converting a tool built in Excel into a C# program.  I've managed to port everything except for the NORMDIST() function, which is scaring the hell out of me.

I need to work out the cumulative normal distribution given x, the mean and the standard deviation.  I posed this question in one of the C# question areas and got what I thought was an answer to my question:

But it turned out not to work, for instance:

x = 2.01490302054226
mean = 3.24088418968677
std dev = 0.427899387582919

Using NORMDIST(x,mean,stddev,TRUE) Excel gives me: 0.00208435088016279
The C# function gives me: 556735840.808829

Sometimes the C# function is spot on, sometimes it's way off like above.

I don't need anyone to write the C# function for me (although it would be nice if you did), I just need it explaining in english how I go about calculating it using basic maths.

If anyone can help, or point me in the right direction?

Question by:littlecharva
  • 2
  • 2
  • 2
  • +2

Expert Comment

ID: 10731809
Look up NORMDIST in the Excel help.  It looks like the formula given in the help is somewhat different from the one computed in your C# function.  Specifically, the value that the exponent is raised to.  According to this formula, the exponent should be raised to:

-pow((x-u),2) / (2*pow(deviation,2))

the remainder of your formula looks fine.

Expert Comment

ID: 10731912

Please disregard my last comment, that is only for the formula where cumulative - FALSE.  I'll see if I can find anything else.

Expert Comment

ID: 10732864
The formula
  Math.Exp(-(Math.Pow((x - mean)/deviation, 2)/2))/Math.Sqrt(2*Math.PI)/deviation;
is perfectly correct for the normal distribution, but it gives the probability mass function.  That is, it behaves like NORMDIST with the 4th parameter hardcoded to FALSE.  We are talking about the bell-shaped curve that is shown in many places when we have to do with random events.

The cumulative function is what you obtain when you integrate the mass function from
-infinity to x.  That is, you calculate the area under the bell-shaped curve that is on the left of x.  This function is not available in analytical form.  Its values can only be obtained numerically...

I don't know how Excel calculates it.  I probably interpolates a table.  Fortunately this function is very well behaved: it starts very flat for far left of the mean (i.e. a few times the deviation; grows smoothly and flattens up again for x far on the right of the mean.  It is symmetrical with respect to the mean, where it reaches the value 0.5 .

It is therefore reasonable to use a table and interpolate it linearly or with a second order polynom.

Let's call t = (x - mean)/deviation and G(t) the cumulative function.  Here are some values:

G(0) = 0.5
G(0.1) = 0.539827896
G(0.2) = 0.579259687
G(0.3) = 0.617911357

If you estimate very crudely G(0.2)est = (G(0.3) + G(0.1))/2 = 0.578869627
you see that the estimate differs from the correct value only by 0.07% That is SEVEN PARTS OUT OF 10,000!

A "curvier" area is around t = 1:
G(0.5) = 0.691462467
G(1.0) = 0.84134474
G(1.5) = 0.933192771
This time the rough interpolation G(1.0)est = (G(0.5) + G(1.5))/2 gives an accuracy of about 3.5%.

How big a table you need to build really depends on the accuracy you need to have.

A linear interpolation (better than the rough one) is done by finding the equation of the straight line between the two points:
  A*t + B = f
You impose that it passes through G(0.5) and G(1.5):
  A*0.5 + B = 0.691462467
  A*1.5 + B = 0.933192771
and resolving for A and B you get:
  f = 0.241730304*t + 0.570597315
valid for 0.5 < t < 1.5

You have a better interpolation if you approximate the curve with a parabola:
  f=A*t^2 + B*t + C
instead of a straight line.  To do this you need three consecutive points (e.g. 0.5, 1.0, and 1.5):
  A*0.25 + B*0.5 + C = 0.691462467
  A*1.00 + B*1.0 + C = 0.933192771
  A*2.25 + B*1.5 + C = 0.691462467

I am not going to find the coefficients here, but it is straightforward to obtain values for A, B, and C.

You should build a grid of function values at intervals D of the variable t and pre-calculate A, B, and C for each triplet of points.  D should be small enough to give you enough accuracy (I would recommend that you use constant Ds instead of trying to optimise the density of points, because you would waste time determining what triplet to use).

The "curviest" part of the cumulative function is where its second derivative is largest, but that is the first derivative of the mass function and can be calculated.  If I didn't make mistakes (!) this should happen at  |t| = 3^(1/4) = 1.316 . Therefore, you should check your accuracy there.

When you estimate a set of A/B/C from a triplet T - D, T, and T + D, use them for
  T - D/2 < t <= T + D/2
to have the best results.

Build the table to cover the interval -5 < t < +5 and use some very coarse approximation outside that.  For t = -5 the cumulative function is about 3*10^ -7.  If you go to +/- 6 you will have 9 significant digits guaranteed and probably can just set the value to 0 outside that.

Only make a table for 1/2 of the curve: -6 < t < 0 and then calculate the other half with the formula f(t) = 1 - f(-t)
LVL 84

Expert Comment

ID: 10733162
#include <stdlib.h>
float NORMDIST(float mean,float stddev){
    static int iset=0;
    static float gset;
    float fac,rsq,v1,v2;
    if( iset==0 ){
            v1=2.0*rand()/RAND_MAX - 1.0;
            v2=2.0*rand()/RAND_MAX - 1.0;
            rsq = v1*v1+v2*v2;
       }while( rsq >= 1.0 || rsq == 0.0 );
        fac = sqrt(-2.0*log(rsq)/rsq);
        iset = 0;
        return mean+stddev*v2*fac;
        return mean+stddev*gset;
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

LVL 84

Accepted Solution

ozo earned 250 total points
ID: 10737443
Sorry, I misunderstood. I thought NORMDIST was supposed to return a normally distributed random variable

This version should give the integral of the normal distribution, which is what I think you wanted

#include <math.h>
float NORMDIST(float x,float mean,float stddev,int cumulative){
    x = (x-mean)/stddev;
    if( cumulative ){
        float t,z,ans;
        z = fabs(x)/sqrt(2.0);
        t = 1.0/(1.0+0.5*z);
        ans = t*exp(-z*z-1.26551223+t*(1.00002368+t*(0.37409196+t*(0.09678418+
        return x <= 0.0 ? ans : 1.0-ans;
        return exp(-x*x/2.0)/sqrt(2.0*3.14159265358979);

Author Comment

ID: 10739229
Cheers ozo, you get the points for making it the easiest for me, but thanks to everyone else for their assistance.


Expert Comment

ID: 10739244
You say you have a working formula that, when gives you back values between 0-1 matches the values in Excel, but it sometimes gives you values out of the range [0,1]... ok, first, the values have to be between 0-1, because what you are looking for is that...

Probabiliy of getting a value less than X in an experiment witch result are distribuited as a gaussian bell centered in "mean" value (the most usual experiment result) and of a "width" (the bell) of "stddev" (how far result use to be from the "mena" result).

...and this probabilities are always coded from 0 (imposible) to 1 (for sure).

So, if your formula is working great for some numbers "near" the center of the bell (the "mean") but worst for other values not so near, then you have to increase the number of iterartions from 16 to 32 or more (it will not be noticed), then give it another try on that numbers. If it worked for some number, should do for others.

If you don't want to play with 16 or 32, you could also make if undefined: Do all iterations needen UNTIL result is between 0,1 (inclusive) and difference with last iteration result is less than 0.0000001, this approach will give you a more reliable formula, but it is a work that can be replaced be using 999 iterations instead of 16!!!

Ofcurse it may not work, but it is very easy to try and I would bet for it!

Expert Comment

ID: 10739249
Opps! I didn't read ozo before writting... great job, ozo!!

Expert Comment

ID: 10739297
Funny. After all the effort of reasoning about interpolating tables, I just found a function which is very similar to ozo's.  You will find the C code in

It only has 5th powers of t. It will teherfore require less computing.  You might like to have a look at it.

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Manufacturing a NPK fertlizer 7 30
A problem of optics 16 145
Vertex form of the function 8 64
Calculating Z-SCORE inside Excel. 4 86
Have you ever thought of installing a power system that generates solar electricity to power your house? Some may say yes, while others may tell me no. But have you noticed that people around you are now considering installing such systems in their …
Lithium-ion batteries area cornerstone of today's portable electronic devices, and even though they are relied upon heavily, their chemistry and origin are not of common knowledge. This article is about a device on which every smartphone, laptop, an…
This is a video describing the growing solar energy use in Utah. This is a topic that greatly interests me and so I decided to produce a video about it.
With Secure Portal Encryption, the recipient is sent a link to their email address directing them to the email laundry delivery page. From there, the recipient will be required to enter a user name and password to enter the page. Once the recipient …

914 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now