?
Solved

Calculating Standard Deviation inside Excel.

Posted on 2016-10-24
5
Medium Priority
?
155 Views
Last Modified: 2016-11-04
Hi

I need to calculate the Standard Deviation Values that would represent the SD Values of

+4
+3
+2
+1
0
-1
-2
-3
-4

based on the data that I have in Column J. The formulas needs to be kept inside the Orange Box in Column J itself. I have attached the sample excel sheet with the data.

Please check the following thread for reference -
https://www.experts-exchange.com/questions/28977905/What-is-the-Difference-between-Standard-Deviations-and-z-scores-calculated-within-Excel.html

I am using Excel 2016 Professional Plus x64

Thanks
Standard-Deviation.xlsx
0
Comment
Question by:Wanu San
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
5 Comments
 

Author Comment

by:Wanu San
ID: 41859553
I have been able to calculate the Standard Deviation Values for each data set respectively, by following the suggestion given by crystal here -
https://www.experts-exchange.com/questions/28978630/Calculating-Z-SCORE-inside-Excel.html

But I am still stuck at the last part, now how do I get the Standard Deviation Values from "+4 to zero to -4", in the orange box ?

Thanks for any ideas.
0
 
LVL 26

Expert Comment

by:Fred Marshall
ID: 41859627
The problem isn't very well stated.  So, one has to make some assumptions about what it means.  I believe it's reasonable to state the problem this way:

"Provide the value for the underlying distribution of the data at the number of standard deviations given in column F."

One might (or might not) decide that the data is "zero mean" effectively .. otherwise it is not.
Or, one might (or might not) decide that the underlying distribution of the data is "zero mean" .. it may well be as it's close to zero.  But what's "close" in this case?  It's up to the eye of the beholder.

So, a simplified view might be that one wants to know the values at +/- 1,2,3,4 standard deviations of the underlying distribution assumed or declared to have zero mean.  In this case, the formulas in Column J would be:
J12==F12*STDEV.P(J$30:J$141)
etc.

A more precise view might be:
j12=F12*STDEV.P(J$30:J$141)+AVERAGE(J$30:J$141)
etc.

The latter formulation biases the results according to the offset of the mean value of the data.
It still assumes a symmetric distribution around the mean which is implied by 1,2,3,4 standard deviations.  Is that reasonable in this case?
In that regard, it's interesting to note that there is NO data value that reaches +4 sigma but it might be said that the underlying distribution could reach that value.
After all, there are only 112 values given and the probability of a value occurring at 4 sigma is low.
0
 

Author Comment

by:Wanu San
ID: 41859693
"Provide the value for the underlying distribution of the data at the number of standard deviations given in column F."
Thank you so much for your comment, Fred Marshall, you have understood the requirement 100 % correctly.

I really do not know if the data is zero mean or not. If we assume that the data is not symmetrically distributed around the mean, then does that means we cannot use Standard Deviations based concepts on this data, in order to separate the middle values from outlier values etc. ? I am just experimenting, I am not sure about all this.

In that regard, it's interesting to note that there is NO data value that reaches +4 sigma but it might be said that the underlying distribution could reach that value.
After all, there are only 112 values given and the probability of a value occurring at 4 sigma is low.

It is perfectly fine, if we do not have any value within this data sample, which would have a +4 sigma. We just need to know what value would have +4 sigma, even if it is not present within the current sample.

J12==F12*STDEV.P(J$30:J$141)
etc.
A more precise view might be:
j12=F12*STDEV.P(J$30:J$141)+AVERAGE(J$30:J$141)
Regarding these 2 different formulas, would you please tell what EXACT DIFFERENCE, does that makes. And which one is preferred over the other, if we are trying to find the middle values and the outlier values from a data sample, by using the concept of standard deviations.

Thanks again for giving it a shot.
0
 
LVL 26

Accepted Solution

by:
Fred Marshall earned 2000 total points
ID: 41860802
I really do not know if the data is zero mean or not.
Well you already have the data in Excel and AVERAGE will give the mean and it comes out around 0.2..
Regarding these 2 different formulas, would you please tell what EXACT DIFFERENCE, does that makes. And which one is preferred over the other, if we are trying to find the middle values and the outlier values from a data sample, by using the concept of standard deviations.
The exact difference is a constant equal to AVERAGE(J$30:J$141).  The idea is that there is an UNDERLYING normal distribution around some mean value.  i.e. where does the peak of the normal curve reside?  Since the average of the data is only around 0.2 then one might choose to ignore it OR one might choose to *decide* for a variety of real-world reasons that the process would be zero mean.  So, expressed "in pictures", this added constant value shifts the normal distribution so the peak is located at the mean value of the DATA.
If a difference of 0.2 matters then you might include it.  If not, then you might not include it.  To me it hardly makes any difference but only you will know if it changes the desired outcome for you in any material way.
If we assume that the data is not symmetrically distributed around the mean, then does that means we cannot use Standard Deviations based concepts on this data,
No but it may mean that a normal distribution isn't perhaps the best UNDERLYING MODEL for the DATA.

Philosophically speaking:
You have the DATA.
You conjure up a MODEL that matches the DATA as well as you can or need.  This MODEL represents what I called earlier the UNDERLYING DISTRIBUTION.
It is very common to use a normal distribution as a model because the real world often behaves that way - or at least behaves that way well enough.  And there are good arguments for this.
But you could create all sorts of models using something like (for example) a Weibull distribution with the parameters for that model selected by matching it to the data.
[See: http://www.niar.wichita.edu/coe/NCAMP_Documents/Publications/NCAMP_Techincal_Presentations/Distribution_for_small_sample_sizes_and_fatigue_data.pdf]
But, as I mentioned before, with only 112 values given you are going to be making assumptions when trying to fit a function all the way out to +/-4 sigma.  So using a normal distribution is not only easier but may also be the best choice for matching and predicting reality.
1
 

Author Closing Comment

by:Wanu San
ID: 41875086
Thank you Fred Marshall
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article descibes how to create a connection between Excel and SAP and how to move data from Excel to SAP or the other way around.
Access developers frequently have requirements to interact with Excel (import from or output to) in their applications.  You might be able to accomplish this with the TransferSpreadsheet and OutputTo methods, but in this series of articles I will di…
Graphs within dashboards are meant to be dynamic, representing data from a period of time that will change each time the dashboard is updated with new data. Rather than update each graph to point to a different set within a static set of data, t…
Although Jacob Bernoulli (1654-1705) has been credited as the creator of "Binomial Distribution Table", Gottfried Leibniz (1646-1716) did his dissertation on the subject in 1666; Leibniz you may recall is the co-inventor of "Calculus" and beat Isaac…

801 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question