Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
Solved

# Statistical Probabilities - How To Calculate Sample Frequencies

Posted on 2013-01-24
Medium Priority
512 Views
Hi All,

I need some help with statistical calculations - I can't remember everything from 20 years ago!

Situation:

We have a production process that has to hit a mean size with a tolerance.

Mean = 100mm
Tolerance = +/- 2mm

So therefore, an item is 'good' if it is greater than or equal to 98mm and less than or equal to 102mm, and bad if it is outside that range.

We are willing to accept a maximum 1% chance, that in a run of 10,000 units, there will be at least one item that is bad.

The production process slowly gets out of alignment (due to vibration and other factors), and has to be periodically recalibrated.  The misalignment is symmetrical in terms of which way it goes out (it is just as likely to lift the mean upwards as downwards).

In order to maintain the quality, we are proposing to do the following:

Define a range with mean = 100mm, Tolerance = +/- X

where X < 2 mm

We will then sample the production output once every Y items.

If an item sampled falls outside of +/- X then the production process gets stopped, and maintenance is done to recalibrate everything back to the original status, and then restart the production process.

Ultimately, I need to calculate the sampling frequency (Y) we have to use to deliver on the objective.

I would likely be doing the calculations by putting them into an excel spreadsheet, but that seems kind of secondary - that just happens to be the brand of calculator I am using.

I suspect I need to know the Standard Deviation of the actual outputs of the process when it is freshly calibrated.  If so, then my first thought is that I would need to get a sufficiently large run (100 items - perhaps more, but it has to be practical, ideally this sample size could be a parameter in the calculation of probability), have them all manually checked, and from that data, calculate an actual mean and Standard Deviation.  I have been *told* that the actual output mean is 100mm, but the production people could not give me a standard deviation figure.

What I need from you is:

1) What additional information is required, in order to do the calculation

2) what is the actual calculation that I have to do to determine Y.

Please do post any queries or clarifications required.

Thanks,

Alan.
0
Question by:Alan
[X]
###### Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

• Help others & share knowledge
• Earn cash & points
• 6
• 3
• 2
• +1

LVL 84

Expert Comment

ID: 38816909
Without a large number of questionable assumptions, I wouldn't want to rely on any estimate based on a run of less than a million.
0

LVL 18

Author Comment

ID: 38817000
Okay, but given what we actually have, how do we move forward to the objective?
0

LVL 8

Accepted Solution

ShannonEE earned 1000 total points
ID: 38817106
Hi there Alan,

We are willing to accept a maximum 1% chance, that in a run of 10,000 units, there will be at least one item that is bad.
That is for every 1,000,000 units produced, 990,000 in 99 batches will be within tolerance, and of the remaining units in 1 batch 1 (or more) could be out of tolerance.

For a single out of tolerance item the probability would be 1E-6 (1.0 * 10 to the -6 or 1 in a million).

Assuming that you have independence between units ** then 2 or more defectives in 1,000,000 units is of the order of 1E-12 which is too small to make any change to the solution.

The solution settles down to a question about relative costs. First there is the cost of measurement. Next, when the testing detects that the production process is going out of tolerance there is a cost to manufacturing of stopping production and re-adjusting the machinery.
Those costs need to be balanced against the cost of exceeding your target of more than 1 batch in 100 with 1 (or more) defectives. You might consider this cost is infinite, but if it was truly infinite then the solution would be examine and measure 990,000 items in each 1,000,000 !

Do you have an idea of these (relative) costs?

~~~~

From the manufacturing perspective, is there information about the "speed" of movement of the mean away from 100 mm when the process starts to go wrong. For example your engineers may be able to say that the mean will not move quicker than 0.001 mm each item.

There is a big body of statistical work on process control as well as on sampling of batches for defectives. The process control work develops "charts" where you constantly plot the data as sampled against time. You pick a formula (based on cumulative sums and/or standard deviations) to detect the "out of control" situation.

Ian

** Note on independence: When the process is "in control" it is reasonable to assume that the variation from the mean of 100 mm is independent from unit to unit. However, when the process is straying from the 100 mm value, the likelihood of the next unit having a positive (or negative) error will be related to the size and direction of the error on the current unit. That is, errors will not be independent.
0

LVL 27

Assisted Solution

aburr earned 700 total points
ID: 38817294
You need standard deviation information.
Measure it
If the sd is 3mm you are in deep trouble.
If the sd is 0.00003 mm you are in great shape
Find out how fast the 100mm shifts and time your recalibrations.
0

LVL 18

Author Comment

ID: 38817396
Hi All,

I have been supplied with some raw data from the production team.

I have attached in an excel spreadsheet (I can supply a plain text file if you prefer).

In the attached, all I received was the data in Data!A2:A1001.

I have then added the NORMDIST function in column B, and charted the output (separate chart sheet).

I can only assume that this data was generated diligently and the process of measurement was good - they say it was.

The shape of the curve is, perhaps, a little too good to be true?  However, I'll go with this for now.

The production team claim that since this data was collected (early in the commissioning process), they have reduced the variability (and hence, I guess, the standard deviation) further, by bolting things down to reduce shaking, but they cannot supply me with an updated set of measurements to prove that it has been effective.  I have requested updated measurements once they have them, but I figure it will just change the ultimate answer.

Hope this helps.

Alan.
Sample-Production-Run-Of-1000-Ac.xls
0

LVL 18

Author Comment

ID: 38817405
@ShannonEE:

That is for every 1,000,000 units produced, 990,000 in 99 batches will be within tolerance, and of the remaining units in 1 batch 1 (or more) could be out of tolerance.

That is correct - we want less than 1% of the deliveries (10,000 units in each) to have any items out of spec.

For a single out of tolerance item the probability would be 1E-6 (1.0 * 10 to the -6 or 1 in a million).

Assuming that you have independence between units ** then 2 or more defectives in 1,000,000 units is of the order of 1E-12 which is too small to make any change to the solution.

Okay

The solution settles down to a question about relative costs. First there is the cost of measurement. Next, when the testing detects that the production process is going out of tolerance there is a cost to manufacturing of stopping production and re-adjusting the machinery.
Those costs need to be balanced against the cost of exceeding your target of more than 1 batch in 100 with 1 (or more) defectives. You might consider this cost is infinite, but if it was truly infinite then the solution would be examine and measure 990,000 items in each 1,000,000 !

Do you have an idea of these (relative) costs?

Not yet, but i will investigate.

Can we work on some assumed figures for now, and when I have more accurate ones, I can update the calculations?  If so, I will post back with some estimates once I have them.

From the manufacturing perspective, is there information about the "speed" of movement of the mean away from 100 mm when the process starts to go wrong. For example your engineers may be able to say that the mean will not move quicker than 0.001 mm each item.

There is a big body of statistical work on process control as well as on sampling of batches for defectives. The process control work develops "charts" where you constantly plot the data as sampled against time. You pick a formula (based on cumulative sums and/or standard deviations) to detect the "out of control" situation.

Ian

I will ask whether any measurement of the 'drift speed' has been attempted.

** Note on independence: When the process is "in control" it is reasonable to assume that the variation from the mean of 100 mm is independent from unit to unit. However, when the process is straying from the 100 mm value, the likelihood of the next unit having a positive (or negative) error will be related to the size and direction of the error on the current unit. That is, errors will not be independent.

Yes - noted.

I'll be back :-)

Alan.
0

LVL 18

Author Comment

ID: 38817412
@aburr:

You need standard deviation information.
Measure it
If the sd is 3mm you are in deep trouble.
If the sd is 0.00003 mm you are in great shape

I have attached (above) sample data.  If my calculations are correct, the SD is about 0.75mm so, predictably, somewhere in the middle of those :-)

Find out how fast the 100mm shifts and time your recalibrations.

I guess that is the same question that @ShannonEE asked above.  I will see if there is any data on this.

Alan.
0

LVL 27

Assisted Solution

aburr earned 700 total points
ID: 38817480
"We are willing to accept a maximum 1% chance, that in a run of 10,000 units, there will be at least one item that is bad."
-
You are in trouble.   Either you need better measurements of the 100mm value or you must design your manufacturing process for better precision.
-
Consider a production run of 10,000 units. Measure 100, getting  100 average +_ 0.75 mm. sd.
The tolerance of 2 mm is 2.6 standard deviation. That means that if everything is independent, that 0.5% will be 102 or greater with a similar number below 98mm.
Thus out of the 10,000 units produced 100 will be out of tolerance.
0

LVL 8

Assisted Solution

ShannonEE earned 1000 total points
ID: 38817622
Hi there Alan,

Its getting close to going home time, with a long weekend coming up. Can you wait a few days?

The example data you supplied shows that you should not just do simple calculations using mean and standard deviation.  This is a (very well behaved) time series. If you don't use time series techniques with time series data then variances (and standard deviations, standard errors) WILL BE wrong. Averages can sometimes also be wrong.

This example data suggests that the drift in mean level is of the order of 1 mm per 400 items or 0.0025 mm per item.  Does your engineers agree that that would be usual/typical?  More importantly could the drift ever get significantly greater than that?

The first differences in item measurement appear to be a stationary process, with significant negative auto-correlation. This suggests that if a particular item is a little bit bigger than usual than the immediate next item will likely be a little bit smaller than usual. However there is little influence on items more than 1 step apart.

This is not really a process with normal errors, in that the "tails" cannot extend out as a normal does. They will have a cutoff perhaps determined by physical aspects of the machine.

The drift will be expected as a normal part of engineering practice, and so what you require is some parameters to be set for the machine re-adjustment, before the drift gets too close to the boundary (100 +/- 2) for comfort. Your comfort level is going to be determined by the relative costs I mentioned above (which will affect your risk aversion).

Although we could do lots of stats, the relative high negative lag1 auto-correlation and limit on the size of "errors" results in a reasonably tight band of values when plotted.

A suitable technique would be to projecting forward the upper and lower bounds until they hit a line inside the boundary (eg 100 +/- 1.8). the 1.8 instead of the 2.0 is a safety margin which would be calculated with reference to your relative costs and the first difference error profile (s.d. ~ 0.23).

The sampling time interval must be such that the lines will not hit a boundary before the next sampling. Additionally the sampling scheme needs to establish good estimates of what the current bounds are.

Ideally you would be looking to set these parameters as determined by the latest set of historical data on hand instead of using values we could determine now for all time.

After determining parameters then you can easily calculate binomial probabilities for a batch with a defective. Using that value we can consequently calculate probabilities of 0, 1, 2 ... batches with defectives in (say) 1000 batches. This assumes machine recalibration/reset when necessary as determined by the parameters.

More thoughts later.

Ian
0

LVL 18

Author Comment

ID: 38817820
Thanks Guys - I won't be able to get any new data until next week now, but I appreciate your help so far, and I'll get back to you once I have more info.

@aburr:  I see what you are saying.  I was hoping that when the production guys said things have improved since that data set was run, they mean *really* improved, as the numbers do seem to imply that reaching the goal of 1 item out of 10,000 being wrong with less than 1% chance is a long way off.

@Ian:  I will find out if the production guys have data on the drift early next week.

Have a good weekend.

Alan.
0

LVL 84

Assisted Solution

ozo earned 300 total points
ID: 38817843
You need standard deviation information.
Measure it
While this is an important step, it is far from adequate.
Many of the usual calculations based on standard deviation information tend to assume a Gaussian distribution. But while a Gaussian distribution can be a good approximation to many real world distributions for many purposes, one of the places where the approximation often turns out to be very poor is in the extreme tails of the distribution.  One in a million events often turn out to be a lot further, or a lot closer to the mean than a Gaussian extrapolation would predict.
And processes going out of alignment also often causes changes in distributions, so the conditions you are most interested in may be exactly the conditions when your statistical assumptions are most likely to be violated.

To achieve the level of assurance you are asking for may take a very careful analysis of every aspect of the production process to thoroughly understand all the potential causes of intolerance and misalignment.  Otherwise, (or in addition) you may need to collect a lot of data.  Perhaps requiring careful monitoring of millions of cycles of  misalignment and recalibration.
Or, if you can develop a good model for the kinds of process errors should be expected given the nature of the process, you may be able to adequately verify the model over hundreds of cycles.
0

LVL 27

Assisted Solution

aburr earned 700 total points
ID: 38820256
"While this is an important step, it is far from adequate."

quite so but the information was designed to emphasis the magnitude of the problem and the additional work needed to solve it.

"Many of the usual calculations based on standard deviation information tend to assume"
(MUST ASSUME)  "a Gaussian distribution. But while a Gaussian distribution can be a good approximation to many real world distributions ..."

shannonEE appears to know what he is talking about and offers a way toward a solution
0

LVL 18

Author Closing Comment

ID: 38959579
Hi All,

This project has been put on hold, so I am going to close this out, and allocate points now.

If or when it comes back, I suspect I will be back with another question, but at least I will be starting further down the path due to your assistance here!

Thanks,

Alan.
0

## Featured Post

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article describes how you can use Custom Document Properties to store settings and other information in your workbook so that they will be available the next time you open the workbook.
After seeing numerous questions for Dynamic Data Validation I notice that most have used Visual Basic to solve the problem. This suggestion is purely formula based and can be used in multiple rows.
This Micro Tutorial demonstrates using Microsoft Excel pivot tables, how to reverse engineer competitors' marketing strategies through backlinks.
Finds all prime numbers in a range requested and places them in a public primes() array. I've demostrated a template size of 30 (2 * 3 * 5) but larger templates can be built such 210  (2 * 3 * 5 * 7) or 2310  (2 * 3 * 5 * 7 * 11). The larger templaâ€¦
###### Suggested Courses
Course of the Month9 days, 23 hours left to enroll