Solved

Finding the lowest and upper point of expected variation

Posted on 2013-05-22
31
1,957 Views
Last Modified: 2013-06-05
How do I find the lowest point of expected variation and the upper point for these numbers?
7.51
7.57
7.55
7.53
7.53
7.56
7.52
7.58
7.55
7.53
7.56
7.58
7.55
7.55
7.54
7.57
7.54
7.55
7.55
7.56
7.54
7.56
7.56
7.55
7.54
7.55
7.56
7.57
7.54
7.55
7.54
7.55
7.53
7.54
7.55
0
Comment
Question by:Jamie33
  • 11
  • 7
  • 4
  • +3
31 Comments
 
LVL 27

Expert Comment

by:d-glitch
ID: 39188610
What's wrong with just using the max and min, which are 7.58 and 7.51 respectively.
0
 

Author Comment

by:Jamie33
ID: 39188616
Because it isn't that simple.
0
 
LVL 27

Expert Comment

by:d-glitch
ID: 39188619
What do the numbers mean, and how are they generated?
0
 
LVL 27

Expert Comment

by:d-glitch
ID: 39188623
How is it more complicated?  What is going on that you haven't told us?
How accurate do you have to be?
0
 
LVL 27

Expert Comment

by:d-glitch
ID: 39188659
The sample you posted has 35 numbers between 7.51 and 7.58.
How many more numbers will you be looking at?
What are the consequences of getting the max or min value wrong?

You could assume that this sample is generated by a normal process with some mean and standard deviation.  

If you make that assumption, then you can find the probability of getting at least one value greater or less some limit in a particular number of tests.
0
 

Author Comment

by:Jamie33
ID: 39188687
Mean for WIDTH:                              7.550
Standard deviation for WIDTH:::            0.030
What is the lowest point of expected variation?                              
What is the upper point of expected variation?                              

What is expected variation?  

We know that 68% of the data from a normal process are expected to fall within + or - 1 sigma (standard deviations) from the mean.

We know that 95% of the data from a normal process are expected to fall within + or - 2 sigma (standard deviations) from the mean.

We know that 99.7% of data from a normal process are expected to fall within + or - 3 sigma(standard deviations) from the mean.

So, the expected variation that we would likely see in any normally distributed process is between + and - 3 sigma (standard deviations) of the mean.
0
 
LVL 84

Expert Comment

by:ozo
ID: 39188717
So do you just want 7.550 +/- 3*0.030 ?
0
 

Author Comment

by:Jamie33
ID: 39188724
Yes
0
 
LVL 84

Expert Comment

by:ozo
ID: 39188769
Do you need a formula for Standard Deviation?
http://mathworld.wolfram.com/StandardDeviation.html
0
 
LVL 27

Expert Comment

by:d-glitch
ID: 39188777
That is all correct.

The expected variation depends on the mean and standard deviation of the process and the number of measurements in the sample.

But you have found the characteristics of the sample not the process
Mean                7.550
Std Dev             0.030

So you might want to say the 95% of the samples fall between 7.495 and 7.615.
And they certainly do.

But shouldn't 5% of the samples fall outside of this range?  It isn't happening.
0
 

Author Comment

by:Jamie33
ID: 39188779
No, thanks a lot.
0
 
LVL 27

Expert Comment

by:d-glitch
ID: 39188788
Are you familiar with the difference between mean and SD for sample and for a population?

     http://www.isixsigma.com/tools-templates/sampling-data/basic-sampling-strategies-sample-vs-population-data/
0
 
LVL 37

Expert Comment

by:TommySzalapski
ID: 39188795
But shouldn't 5% of the samples fall outside of this range?  It isn't happening.
So it seems highly likely that your process is not normally distributed.
0
 

Author Comment

by:Jamie33
ID: 39188796
So I have both the Mean and the StDev. How did you come up with the 95%?
0
What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 84

Expert Comment

by:ozo
ID: 39188808
We know that 99.7% of data from a normal process are expected to fall within + or - 3 sigma(standard deviations) from the mean.
But if we don't know whether a set of numbers was generated from a normal process, this may not be relevant.
0
 

Author Comment

by:Jamie33
ID: 39188844
I Believe the following:

Lowest= StDev times -3.00 minus the Mean
Upper+ StDev times 3.00 minus the mean.

Is my formula wrong?
0
 
LVL 27

Expert Comment

by:d-glitch
ID: 39188861
>>  How did you come up with the 95%?

I was just trying out your ±2 sigma rule.

If all you have is the data, then that's all you have.
There is really no way to make any sort of reliable prediction.

If you are willing or able to make some assumptions, then you may be able to do more.

Do you have reason to believe your data are randomly selected from a much larger normal distribution?

Is there a specific question would you like to answer or a prediction you would like to make?
0
 
LVL 37

Expert Comment

by:TommySzalapski
ID: 39188893
There is no "lowest" or "upper"
If your process is normally distributed, then well over 99% should be within 3 standard deviations of the mean (as you suggest) but there could be a number that comes in 50 standard deviations above the mean, it's just very unlikely (unless your dataset is incredibly huge).

So maybe the answer is "There is no upper and lower"
0
 
LVL 37

Expert Comment

by:TommySzalapski
ID: 39188901
You can get upper and lower bounds for a confidence interval (95%, 98%, etc) but not for the whole thing. Of course that depends on what your distribution really is. I don't think any real life data is really perfectly normally distributed, just close enough that the numbers work out okay.
0
 
LVL 27

Expert Comment

by:d-glitch
ID: 39188910
The 3 sigma rule would be:            Lowest  =  Mean - 3*SD
                                                             Highest =  Mean + 3*SD

99.7% of the normal population should fall within these bounds.

If you take one measurement, there is a 99.7% chance it will be within the bounds.

So if you take 1000 measurements,  997 should be inside and 3 should be outside.

But your sample seems to have failed the 2 sigma test.  There isn't enough scatter in the data.
So either you don't have a normal distribution or you don't have a random sample.
0
 
LVL 27

Expert Comment

by:d-glitch
ID: 39189003
Is this an academic exercise or a real world problem?
Are you actually measuring something?  If so, what?
What do the data mean and what do you hope to accomplish by setting boundaries?

How do you know that my first post isn't good enough?

>>  What's wrong with just using the max and min, which are 7.58 and 7.51 respectively.

Or adding 0.1 to the min and max, and using the range from 7.50 to 7.59
0
 

Author Comment

by:Jamie33
ID: 39189032
It is an academic problem.
0
 
LVL 27

Expert Comment

by:d-glitch
ID: 39189137
This is the Excel formula for a Normal Dist
      =NORMINV(RAND(), 7.545, 0.03)

If you use that to generate a 35 element sample, you will see that it usually has more scatter than your posted sample.

7.51       7.53       7.53       7.49       7.56       7.50
7.59       7.55       7.57       7.51       7.53       7.57
7.48       7.56       7.48       7.56       7.53       7.53
7.53       7.51       7.49       7.57       7.52       7.52
7.55       7.55       7.56       7.52       7.51       7.53
7.57       7.59       7.57       7.57       7.55       7.57
7.53       7.49       7.55       7.53       7.54       7.55
0
 
LVL 27

Expert Comment

by:d-glitch
ID: 39189169
But what does it all mean.

Maybe you are supposed to assume that your sample is representative of a normal distribution.
Then your 3*sigma rule would be correct at the 0.3% level.

Maybe you are supposed to notice that you don't have a normal distribution.

Maybe the instructor tried to generate a normal dist by hand and did a bad job.

Maybe there is a problem due to the small sample size (35) and the poor resolution (8 bins) of the data.

Maybe it is the sample versus population issue I mentioned earlier.
0
 
LVL 27

Expert Comment

by:aburr
ID: 39189220
As in any real world situation you do not have a normal distribution. It may be close enough though.You can use a bootstrap program (see itunes store) to use your data to get a very close approximation to the mean and sd of an equivalent normal distribution.
0
 
LVL 8

Accepted Solution

by:
ShannonEE earned 500 total points
ID: 39190031
Hi there Jamie33,

Although you haven't exactly used the usual language of statistics
lowest point of expected variation and the upper point for these numbers
I think I understand what you mean.

You want to know what are the lowest value (A) and highest value (B) that all future values will lie between. That is when the data is taken from the same source as the example numbers. With the extra condition that you want the largest possible value for A, and the smallest possible value for B.

Your data looks like -

Frequency plot of Q_28136099 data
Now the usual statistical case is that if you want to make inferences about population parameters (in this case a min and max value) then the sample drawn from that population you will use to develop a statistic must be unbiased.  One way of getting an unbiased sample is random sampling.  If the sample could be biased in unknown ways then any statistical  analysis will be invalid!

That said,  if you can provide an assurance that the population was normally distributed then the answers above is the best you will be able to obtain. Note that with n = 35 you should be using the appropriate value from the t-tables with 34 degrees of freedom rather than the value from the normal tables.

Aside

- the problem here is that you are not using the true values from the distribution - those values you don't know - and so you make do with estimating them from the sample. However in estimating the standard deviation from the sample you are using an already estimated value for the mean which means that unless adjusted by using a t value (and not a normal value) it will be too small.


However there are other things we should consider.

1.

Triangle distributionFirst it could well be true that the data came from a triangle distribution. In that case you could estimate the required parameters using the maximum likelihood estimation technique. Now for the normal distribution, the  uniform distribution, the triangle distribution with A and B fixed (and want to find the position of the mode where it changes direction) then it is possible to work out a formula in terms of values in the sample.  You just plug those values in and as they say in the movies, Bob's your uncle.  However to find the 3 important parameters of a triangle distribution (A, B, and C) in the general case there is no formula that I know of.  I strongly suspect it is provable that in general a unique analytical solution does not exist. You will need to use numerical mathematics to do some sort of hill climbing technique on the likelihood function to get the best estimates for A and B.  Alternatevely, using the openbugs program you could develop credible intervals for both A and B using the evidence (this is the individual values) from your sample.

However if it is not critical then there are other easier ways to get estimates. They will not however be on a solid foundation.
 

2.

There is yet another attackFor any distribution we have Chebyshev’s inequality which says
at least 1 - 1 /(K * K)  fraction of values must fall within K standard deviations from the mean.
(K > 1)
See This discussion of Chebyshev's inequality
In your sample data set (and in fact with any data set what so ever) this applies. It calculates a wide enough margin that is is always TRUE.  If you want to transfer that knowledge to the unknown distribution that your data came from you will be using estimates (the sample mean for the population mean, and the sample standard deviation - based on n-1 - for the population standard deviation) so there is a (probably very small) chance of making an assertion that is incorrect.

=====

Questions.

Do you know that the population the data comes from is normally distributed?
Or triangle distributed?
Or uniformly distributed - (I very much doubt this one)?
Do you know if the sampling technique for getting sample values is unbiased?

Hope this makes sense.  Please respond if you want more info or examples.

Ian
0
 
LVL 37

Expert Comment

by:TommySzalapski
ID: 39190688
You want to know what are the lowest value (A) and highest value (B) that all future values will lie between.
Again, no such numbers exist. You say it is an academic problem, so did they give you a confidence interval to use? What wording does the problem use?
0
 
LVL 8

Expert Comment

by:ShannonEE
ID: 39193134
Hi there Tommy,

Don't be too hasty.  If the process is truly normally distributed then arbitrarily large and small numbers are possible. However this cannot be the case as the values are measurements of width which are impossible to go negative.  Additionally if they are from an industrial process then there is going to be some (maybe quite large) value on the upper width so as the piece will actually fit in the machinery or whatever is producing the items. While such values would be absolute bounds, they would be next to useless - a zero minimum value being more than 250*sd from the mean.

Industrial processes can be influenced by many small perturbations which if independent will (by the central limit theorem) become approximately close to a normal distribution. However it is the tails where any approx fit to a normal could start to break down, especially if influenced by absolute mins and maxs due to physical considerations.

It all boils down to a practical problem of knowing -

1.

How they want to specify the limits - eg no more than 1% chance of a new item being outside the limits or a real absolute -never go past - limit

2.

The physical aspects of the process generating the items - for any absolute limits and the squeeze that will put on the distribution tails

3.

How the process behaves in the case of unusually large or small items - eg do they have manual overrides to prevent such values and thus generating a truncated distribution....


My feeling is that this is a poorly posed academic question, however Jamie33 may be able to provide more useful information.

Ian
0
 
LVL 8

Expert Comment

by:ShannonEE
ID: 39215256
Hi there Modulus_Twelve,

I would reject (A). The asker has been forced to consider issues and hence has benefited from the combined help for the experts.

I would reject (B) as so far it doesn't appear that the asker has a value for these lowest and highest points as requested.

Hence (C) is my recommendation, however I don't believe that the asker has so far "found" an answer (or probably more importantly in view of http:#a39189032 its academic)  a method to produce an answer.  Maybe an option (D) Continue to keep the Q open.

The question was poorly worded, but for someone naive about the subject area this is understandable. Hence  the asker was asked many times (http:#a39188659 , http:#a39188808 , http:#a39188861 and http:#a39190031 ) about this.  Unfortunately there was no forthcoming answer.  Do you want to determine the minimum and maximum of all items coming from the process that this dataset was sampled from?  and  Is that process normally distributed?

There are many curly aspects to questions in this area (like given a probability of all data points in the next sample being within the bounds, then those bounds will depend on the size of that future sample: the smaller the sample the closer those bounds are together).

However so far, the asker has been given more than enough to consider in either nailing down the exact problem or calculating an answer.  
For example  http:#a39188610 , http:#a39188717 , http:#a39188893 , http:#a39188910 , http:#a39190031  

If the Q is homework then http:#a39189169 gives a good list of things we need to consider.

Ian
0

Featured Post

Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

Join & Write a Comment

Learn more about the importance of email disclaimers with our top 10 email disclaimer DOs and DON’Ts.
If you get continual lockouts after changing your Active Directory password, there are several possible reasons.  Two of the most common are using other devices to access your email and stored passwords in the credential manager of windows.
This Micro Tutorial will demonstrate in Google Sheets how to use the HYPERLINK function to create live links inside your spreadsheet.
This Micro Tutorial demonstrates in Microsoft Excel how to consolidate your marketing data by creating an interactive charts using form controls. This creates cool drop-downs for viewers of your chart to choose from.

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now