Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
Solved

# Calculating Standard Deviation inside Excel.

Posted on 2016-10-24
Medium Priority
169 Views
Hi

I need to calculate the Standard Deviation Values that would represent the SD Values of

+4
+3
+2
+1
0
-1
-2
-3
-4

based on the data that I have in Column J. The formulas needs to be kept inside the Orange Box in Column J itself. I have attached the sample excel sheet with the data.

https://www.experts-exchange.com/questions/28977905/What-is-the-Difference-between-Standard-Deviations-and-z-scores-calculated-within-Excel.html

I am using Excel 2016 Professional Plus x64

Thanks
Standard-Deviation.xlsx
0
Question by:Wanu San
[X]
###### Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

• Help others & share knowledge
• Earn cash & points
• 3
• 2

Author Comment

ID: 41859553
I have been able to calculate the Standard Deviation Values for each data set respectively, by following the suggestion given by crystal here -
https://www.experts-exchange.com/questions/28978630/Calculating-Z-SCORE-inside-Excel.html

But I am still stuck at the last part, now how do I get the Standard Deviation Values from "+4 to zero to -4", in the orange box ?

Thanks for any ideas.
0

LVL 26

Expert Comment

ID: 41859627
The problem isn't very well stated.  So, one has to make some assumptions about what it means.  I believe it's reasonable to state the problem this way:

"Provide the value for the underlying distribution of the data at the number of standard deviations given in column F."

One might (or might not) decide that the data is "zero mean" effectively .. otherwise it is not.
Or, one might (or might not) decide that the underlying distribution of the data is "zero mean" .. it may well be as it's close to zero.  But what's "close" in this case?  It's up to the eye of the beholder.

So, a simplified view might be that one wants to know the values at +/- 1,2,3,4 standard deviations of the underlying distribution assumed or declared to have zero mean.  In this case, the formulas in Column J would be:
J12==F12*STDEV.P(J\$30:J\$141)
etc.

A more precise view might be:
j12=F12*STDEV.P(J\$30:J\$141)+AVERAGE(J\$30:J\$141)
etc.

The latter formulation biases the results according to the offset of the mean value of the data.
It still assumes a symmetric distribution around the mean which is implied by 1,2,3,4 standard deviations.  Is that reasonable in this case?
In that regard, it's interesting to note that there is NO data value that reaches +4 sigma but it might be said that the underlying distribution could reach that value.
After all, there are only 112 values given and the probability of a value occurring at 4 sigma is low.
0

Author Comment

ID: 41859693
"Provide the value for the underlying distribution of the data at the number of standard deviations given in column F."
Thank you so much for your comment, Fred Marshall, you have understood the requirement 100 % correctly.

I really do not know if the data is zero mean or not. If we assume that the data is not symmetrically distributed around the mean, then does that means we cannot use Standard Deviations based concepts on this data, in order to separate the middle values from outlier values etc. ? I am just experimenting, I am not sure about all this.

In that regard, it's interesting to note that there is NO data value that reaches +4 sigma but it might be said that the underlying distribution could reach that value.
After all, there are only 112 values given and the probability of a value occurring at 4 sigma is low.

It is perfectly fine, if we do not have any value within this data sample, which would have a +4 sigma. We just need to know what value would have +4 sigma, even if it is not present within the current sample.

J12==F12*STDEV.P(J\$30:J\$141)
etc.
A more precise view might be:
j12=F12*STDEV.P(J\$30:J\$141)+AVERAGE(J\$30:J\$141)
Regarding these 2 different formulas, would you please tell what EXACT DIFFERENCE, does that makes. And which one is preferred over the other, if we are trying to find the middle values and the outlier values from a data sample, by using the concept of standard deviations.

Thanks again for giving it a shot.
0

LVL 26

Accepted Solution

Fred Marshall earned 2000 total points
ID: 41860802
I really do not know if the data is zero mean or not.
Well you already have the data in Excel and AVERAGE will give the mean and it comes out around 0.2..
Regarding these 2 different formulas, would you please tell what EXACT DIFFERENCE, does that makes. And which one is preferred over the other, if we are trying to find the middle values and the outlier values from a data sample, by using the concept of standard deviations.
The exact difference is a constant equal to AVERAGE(J\$30:J\$141).  The idea is that there is an UNDERLYING normal distribution around some mean value.  i.e. where does the peak of the normal curve reside?  Since the average of the data is only around 0.2 then one might choose to ignore it OR one might choose to *decide* for a variety of real-world reasons that the process would be zero mean.  So, expressed "in pictures", this added constant value shifts the normal distribution so the peak is located at the mean value of the DATA.
If a difference of 0.2 matters then you might include it.  If not, then you might not include it.  To me it hardly makes any difference but only you will know if it changes the desired outcome for you in any material way.
If we assume that the data is not symmetrically distributed around the mean, then does that means we cannot use Standard Deviations based concepts on this data,
No but it may mean that a normal distribution isn't perhaps the best UNDERLYING MODEL for the DATA.

Philosophically speaking:
You have the DATA.
You conjure up a MODEL that matches the DATA as well as you can or need.  This MODEL represents what I called earlier the UNDERLYING DISTRIBUTION.
It is very common to use a normal distribution as a model because the real world often behaves that way - or at least behaves that way well enough.  And there are good arguments for this.
But you could create all sorts of models using something like (for example) a Weibull distribution with the parameters for that model selected by matching it to the data.
[See: http://www.niar.wichita.edu/coe/NCAMP_Documents/Publications/NCAMP_Techincal_Presentations/Distribution_for_small_sample_sizes_and_fatigue_data.pdf]
But, as I mentioned before, with only 112 values given you are going to be making assumptions when trying to fit a function all the way out to +/-4 sigma.  So using a normal distribution is not only easier but may also be the best choice for matching and predicting reality.
1

Author Closing Comment

ID: 41875086
Thank you Fred Marshall
0

## Featured Post

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

: Microsoft Office Collaborate for free and online versions of Microsoft  Word, Excel, Powerpoint, OneNote, Onedrive , Email, Calendar etc. In short we can say that Microsoft office is a suite of servers, applications and services developed by  Micr…
Excel can be a tricky bit of software to get your head around. Whilst you’ll be able to eventually get to grips with the basic understanding of how to get by, there are a few Excel tips that not everybody will even know about let alone know how to d…
The viewer will learn how to create a normally distributed random variable in Excel, use a normal distribution to simulate the return on an investment over a period of years, Create a Monte Carlo simulation using a normal random variable, and calcul…
This Micro Tutorial demonstrate the bugs in Microsoft Excel for Mac with Pivot Charts.
###### Suggested Courses
Course of the Month9 days, 18 hours left to enroll