Solved

Calculating Standard Deviation inside Excel.

Posted on 2016-10-24
5
50 Views
Last Modified: 2016-11-04
Hi

I need to calculate the Standard Deviation Values that would represent the SD Values of

+4
+3
+2
+1
0
-1
-2
-3
-4

based on the data that I have in Column J. The formulas needs to be kept inside the Orange Box in Column J itself. I have attached the sample excel sheet with the data.

Please check the following thread for reference -
https://www.experts-exchange.com/questions/28977905/What-is-the-Difference-between-Standard-Deviations-and-z-scores-calculated-within-Excel.html

I am using Excel 2016 Professional Plus x64

Thanks
Standard-Deviation.xlsx
0
Comment
Question by:Wanu San
  • 3
  • 2
5 Comments
 

Author Comment

by:Wanu San
Comment Utility
I have been able to calculate the Standard Deviation Values for each data set respectively, by following the suggestion given by crystal here -
https://www.experts-exchange.com/questions/28978630/Calculating-Z-SCORE-inside-Excel.html

But I am still stuck at the last part, now how do I get the Standard Deviation Values from "+4 to zero to -4", in the orange box ?

Thanks for any ideas.
0
 
LVL 25

Expert Comment

by:Fred Marshall
Comment Utility
The problem isn't very well stated.  So, one has to make some assumptions about what it means.  I believe it's reasonable to state the problem this way:

"Provide the value for the underlying distribution of the data at the number of standard deviations given in column F."

One might (or might not) decide that the data is "zero mean" effectively .. otherwise it is not.
Or, one might (or might not) decide that the underlying distribution of the data is "zero mean" .. it may well be as it's close to zero.  But what's "close" in this case?  It's up to the eye of the beholder.

So, a simplified view might be that one wants to know the values at +/- 1,2,3,4 standard deviations of the underlying distribution assumed or declared to have zero mean.  In this case, the formulas in Column J would be:
J12==F12*STDEV.P(J$30:J$141)
etc.

A more precise view might be:
j12=F12*STDEV.P(J$30:J$141)+AVERAGE(J$30:J$141)
etc.

The latter formulation biases the results according to the offset of the mean value of the data.
It still assumes a symmetric distribution around the mean which is implied by 1,2,3,4 standard deviations.  Is that reasonable in this case?
In that regard, it's interesting to note that there is NO data value that reaches +4 sigma but it might be said that the underlying distribution could reach that value.
After all, there are only 112 values given and the probability of a value occurring at 4 sigma is low.
0
 

Author Comment

by:Wanu San
Comment Utility
"Provide the value for the underlying distribution of the data at the number of standard deviations given in column F."
Thank you so much for your comment, Fred Marshall, you have understood the requirement 100 % correctly.

I really do not know if the data is zero mean or not. If we assume that the data is not symmetrically distributed around the mean, then does that means we cannot use Standard Deviations based concepts on this data, in order to separate the middle values from outlier values etc. ? I am just experimenting, I am not sure about all this.

In that regard, it's interesting to note that there is NO data value that reaches +4 sigma but it might be said that the underlying distribution could reach that value.
After all, there are only 112 values given and the probability of a value occurring at 4 sigma is low.

It is perfectly fine, if we do not have any value within this data sample, which would have a +4 sigma. We just need to know what value would have +4 sigma, even if it is not present within the current sample.

J12==F12*STDEV.P(J$30:J$141)
etc.
A more precise view might be:
j12=F12*STDEV.P(J$30:J$141)+AVERAGE(J$30:J$141)
Regarding these 2 different formulas, would you please tell what EXACT DIFFERENCE, does that makes. And which one is preferred over the other, if we are trying to find the middle values and the outlier values from a data sample, by using the concept of standard deviations.

Thanks again for giving it a shot.
0
 
LVL 25

Accepted Solution

by:
Fred Marshall earned 500 total points
Comment Utility
I really do not know if the data is zero mean or not.
Well you already have the data in Excel and AVERAGE will give the mean and it comes out around 0.2..
Regarding these 2 different formulas, would you please tell what EXACT DIFFERENCE, does that makes. And which one is preferred over the other, if we are trying to find the middle values and the outlier values from a data sample, by using the concept of standard deviations.
The exact difference is a constant equal to AVERAGE(J$30:J$141).  The idea is that there is an UNDERLYING normal distribution around some mean value.  i.e. where does the peak of the normal curve reside?  Since the average of the data is only around 0.2 then one might choose to ignore it OR one might choose to *decide* for a variety of real-world reasons that the process would be zero mean.  So, expressed "in pictures", this added constant value shifts the normal distribution so the peak is located at the mean value of the DATA.
If a difference of 0.2 matters then you might include it.  If not, then you might not include it.  To me it hardly makes any difference but only you will know if it changes the desired outcome for you in any material way.
If we assume that the data is not symmetrically distributed around the mean, then does that means we cannot use Standard Deviations based concepts on this data,
No but it may mean that a normal distribution isn't perhaps the best UNDERLYING MODEL for the DATA.

Philosophically speaking:
You have the DATA.
You conjure up a MODEL that matches the DATA as well as you can or need.  This MODEL represents what I called earlier the UNDERLYING DISTRIBUTION.
It is very common to use a normal distribution as a model because the real world often behaves that way - or at least behaves that way well enough.  And there are good arguments for this.
But you could create all sorts of models using something like (for example) a Weibull distribution with the parameters for that model selected by matching it to the data.
[See: http://www.niar.wichita.edu/coe/NCAMP_Documents/Publications/NCAMP_Techincal_Presentations/Distribution_for_small_sample_sizes_and_fatigue_data.pdf]
But, as I mentioned before, with only 112 values given you are going to be making assumptions when trying to fit a function all the way out to +/-4 sigma.  So using a normal distribution is not only easier but may also be the best choice for matching and predicting reality.
1
 

Author Closing Comment

by:Wanu San
Comment Utility
Thank you Fred Marshall
0

Featured Post

Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Have you ever thought of installing a power system that generates solar electricity to power your house? Some may say yes, while others may tell me no. But have you noticed that people around you are now considering installing such systems in their …
We are taking giant steps in technological advances in the field of wireless telephony. At just 10 years since the advent of smartphones, it is crucial to examine the benefits and disadvantages that have been report to us.
The view will learn how to download and install SIMTOOLS and FORMLIST into Excel, how to use SIMTOOLS to generate a Monte Carlo simulation of 30 sales calls, and how to calculate the conditional probability based on the results of the Monte Carlo …
This Micro Tutorial will demonstrate how to use a scrolling table in Microsoft Excel using the INDEX function.

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now