# Statistical Probabilities - How To Calculate Sample Frequencies

Hi All,

I need some help with statistical calculations - I can't remember everything from 20 years ago!

Situation:

We have a production process that has to hit a mean size with a tolerance.

Mean = 100mm
Tolerance = +/- 2mm

So therefore, an item is 'good' if it is greater than or equal to 98mm and less than or equal to 102mm, and bad if it is outside that range.

We are willing to accept a maximum 1% chance, that in a run of 10,000 units, there will be at least one item that is bad.

The production process slowly gets out of alignment (due to vibration and other factors), and has to be periodically recalibrated.  The misalignment is symmetrical in terms of which way it goes out (it is just as likely to lift the mean upwards as downwards).

In order to maintain the quality, we are proposing to do the following:

Define a range with mean = 100mm, Tolerance = +/- X

where X < 2 mm

We will then sample the production output once every Y items.

If an item sampled falls outside of +/- X then the production process gets stopped, and maintenance is done to recalibrate everything back to the original status, and then restart the production process.

Ultimately, I need to calculate the sampling frequency (Y) we have to use to deliver on the objective.

I would likely be doing the calculations by putting them into an excel spreadsheet, but that seems kind of secondary - that just happens to be the brand of calculator I am using.

I suspect I need to know the Standard Deviation of the actual outputs of the process when it is freshly calibrated.  If so, then my first thought is that I would need to get a sufficiently large run (100 items - perhaps more, but it has to be practical, ideally this sample size could be a parameter in the calculation of probability), have them all manually checked, and from that data, calculate an actual mean and Standard Deviation.  I have been *told* that the actual output mean is 100mm, but the production people could not give me a standard deviation figure.

What I need from you is:

1) What additional information is required, in order to do the calculation

2) what is the actual calculation that I have to do to determine Y.

Please do post any queries or clarifications required.

Thanks,

Alan.
LVL 23
###### Who is Participating?

Commented:
Hi there Alan,

We are willing to accept a maximum 1% chance, that in a run of 10,000 units, there will be at least one item that is bad.
That is for every 1,000,000 units produced, 990,000 in 99 batches will be within tolerance, and of the remaining units in 1 batch 1 (or more) could be out of tolerance.

For a single out of tolerance item the probability would be 1E-6 (1.0 * 10 to the -6 or 1 in a million).

Assuming that you have independence between units ** then 2 or more defectives in 1,000,000 units is of the order of 1E-12 which is too small to make any change to the solution.

The solution settles down to a question about relative costs. First there is the cost of measurement. Next, when the testing detects that the production process is going out of tolerance there is a cost to manufacturing of stopping production and re-adjusting the machinery.
Those costs need to be balanced against the cost of exceeding your target of more than 1 batch in 100 with 1 (or more) defectives. You might consider this cost is infinite, but if it was truly infinite then the solution would be examine and measure 990,000 items in each 1,000,000 !

Do you have an idea of these (relative) costs?

~~~~

From the manufacturing perspective, is there information about the "speed" of movement of the mean away from 100 mm when the process starts to go wrong. For example your engineers may be able to say that the mean will not move quicker than 0.001 mm each item.

There is a big body of statistical work on process control as well as on sampling of batches for defectives. The process control work develops "charts" where you constantly plot the data as sampled against time. You pick a formula (based on cumulative sums and/or standard deviations) to detect the "out of control" situation.

Ian

** Note on independence: When the process is "in control" it is reasonable to assume that the variation from the mean of 100 mm is independent from unit to unit. However, when the process is straying from the 100 mm value, the likelihood of the next unit having a positive (or negative) error will be related to the size and direction of the error on the current unit. That is, errors will not be independent.
0

Commented:
Without a large number of questionable assumptions, I wouldn't want to rely on any estimate based on a run of less than a million.
0

ConsultantAuthor Commented:
Okay, but given what we actually have, how do we move forward to the objective?
0

Commented:
You need standard deviation information.
Measure it
If the sd is 3mm you are in deep trouble.
If the sd is 0.00003 mm you are in great shape
Find out how fast the 100mm shifts and time your recalibrations.
0

ConsultantAuthor Commented:
Hi All,

I have been supplied with some raw data from the production team.

I have attached in an excel spreadsheet (I can supply a plain text file if you prefer).

In the attached, all I received was the data in Data!A2:A1001.

I have then added the NORMDIST function in column B, and charted the output (separate chart sheet).

I can only assume that this data was generated diligently and the process of measurement was good - they say it was.

The shape of the curve is, perhaps, a little too good to be true?  However, I'll go with this for now.

The production team claim that since this data was collected (early in the commissioning process), they have reduced the variability (and hence, I guess, the standard deviation) further, by bolting things down to reduce shaking, but they cannot supply me with an updated set of measurements to prove that it has been effective.  I have requested updated measurements once they have them, but I figure it will just change the ultimate answer.

Hope this helps.

Alan.
Sample-Production-Run-Of-1000-Ac.xls
0

ConsultantAuthor Commented:
@ShannonEE:

That is for every 1,000,000 units produced, 990,000 in 99 batches will be within tolerance, and of the remaining units in 1 batch 1 (or more) could be out of tolerance.

That is correct - we want less than 1% of the deliveries (10,000 units in each) to have any items out of spec.

For a single out of tolerance item the probability would be 1E-6 (1.0 * 10 to the -6 or 1 in a million).

Assuming that you have independence between units ** then 2 or more defectives in 1,000,000 units is of the order of 1E-12 which is too small to make any change to the solution.

Okay

The solution settles down to a question about relative costs. First there is the cost of measurement. Next, when the testing detects that the production process is going out of tolerance there is a cost to manufacturing of stopping production and re-adjusting the machinery.
Those costs need to be balanced against the cost of exceeding your target of more than 1 batch in 100 with 1 (or more) defectives. You might consider this cost is infinite, but if it was truly infinite then the solution would be examine and measure 990,000 items in each 1,000,000 !

Do you have an idea of these (relative) costs?

Not yet, but i will investigate.

Can we work on some assumed figures for now, and when I have more accurate ones, I can update the calculations?  If so, I will post back with some estimates once I have them.

From the manufacturing perspective, is there information about the "speed" of movement of the mean away from 100 mm when the process starts to go wrong. For example your engineers may be able to say that the mean will not move quicker than 0.001 mm each item.

There is a big body of statistical work on process control as well as on sampling of batches for defectives. The process control work develops "charts" where you constantly plot the data as sampled against time. You pick a formula (based on cumulative sums and/or standard deviations) to detect the "out of control" situation.

Ian

I will ask whether any measurement of the 'drift speed' has been attempted.

** Note on independence: When the process is "in control" it is reasonable to assume that the variation from the mean of 100 mm is independent from unit to unit. However, when the process is straying from the 100 mm value, the likelihood of the next unit having a positive (or negative) error will be related to the size and direction of the error on the current unit. That is, errors will not be independent.

Yes - noted.

I'll be back :-)

Alan.
0

ConsultantAuthor Commented:
@aburr:

You need standard deviation information.
Measure it
If the sd is 3mm you are in deep trouble.
If the sd is 0.00003 mm you are in great shape

I have attached (above) sample data.  If my calculations are correct, the SD is about 0.75mm so, predictably, somewhere in the middle of those :-)

Find out how fast the 100mm shifts and time your recalibrations.

I guess that is the same question that @ShannonEE asked above.  I will see if there is any data on this.

Alan.
0

Commented:
"We are willing to accept a maximum 1% chance, that in a run of 10,000 units, there will be at least one item that is bad."
-
You are in trouble.   Either you need better measurements of the 100mm value or you must design your manufacturing process for better precision.
-
Consider a production run of 10,000 units. Measure 100, getting  100 average +_ 0.75 mm. sd.
The tolerance of 2 mm is 2.6 standard deviation. That means that if everything is independent, that 0.5% will be 102 or greater with a similar number below 98mm.
Thus out of the 10,000 units produced 100 will be out of tolerance.
0

Commented:
Hi there Alan,

Its getting close to going home time, with a long weekend coming up. Can you wait a few days?

The example data you supplied shows that you should not just do simple calculations using mean and standard deviation.  This is a (very well behaved) time series. If you don't use time series techniques with time series data then variances (and standard deviations, standard errors) WILL BE wrong. Averages can sometimes also be wrong.

This example data suggests that the drift in mean level is of the order of 1 mm per 400 items or 0.0025 mm per item.  Does your engineers agree that that would be usual/typical?  More importantly could the drift ever get significantly greater than that?

The first differences in item measurement appear to be a stationary process, with significant negative auto-correlation. This suggests that if a particular item is a little bit bigger than usual than the immediate next item will likely be a little bit smaller than usual. However there is little influence on items more than 1 step apart.

This is not really a process with normal errors, in that the "tails" cannot extend out as a normal does. They will have a cutoff perhaps determined by physical aspects of the machine.

The drift will be expected as a normal part of engineering practice, and so what you require is some parameters to be set for the machine re-adjustment, before the drift gets too close to the boundary (100 +/- 2) for comfort. Your comfort level is going to be determined by the relative costs I mentioned above (which will affect your risk aversion).

Although we could do lots of stats, the relative high negative lag1 auto-correlation and limit on the size of "errors" results in a reasonably tight band of values when plotted.

A suitable technique would be to projecting forward the upper and lower bounds until they hit a line inside the boundary (eg 100 +/- 1.8). the 1.8 instead of the 2.0 is a safety margin which would be calculated with reference to your relative costs and the first difference error profile (s.d. ~ 0.23).

The sampling time interval must be such that the lines will not hit a boundary before the next sampling. Additionally the sampling scheme needs to establish good estimates of what the current bounds are.

Ideally you would be looking to set these parameters as determined by the latest set of historical data on hand instead of using values we could determine now for all time.

After determining parameters then you can easily calculate binomial probabilities for a batch with a defective. Using that value we can consequently calculate probabilities of 0, 1, 2 ... batches with defectives in (say) 1000 batches. This assumes machine recalibration/reset when necessary as determined by the parameters.

More thoughts later.

Ian
0

ConsultantAuthor Commented:
Thanks Guys - I won't be able to get any new data until next week now, but I appreciate your help so far, and I'll get back to you once I have more info.

@aburr:  I see what you are saying.  I was hoping that when the production guys said things have improved since that data set was run, they mean *really* improved, as the numbers do seem to imply that reaching the goal of 1 item out of 10,000 being wrong with less than 1% chance is a long way off.

@Ian:  I will find out if the production guys have data on the drift early next week.

Have a good weekend.

Alan.
0

Commented:
You need standard deviation information.
Measure it
While this is an important step, it is far from adequate.
Many of the usual calculations based on standard deviation information tend to assume a Gaussian distribution. But while a Gaussian distribution can be a good approximation to many real world distributions for many purposes, one of the places where the approximation often turns out to be very poor is in the extreme tails of the distribution.  One in a million events often turn out to be a lot further, or a lot closer to the mean than a Gaussian extrapolation would predict.
And processes going out of alignment also often causes changes in distributions, so the conditions you are most interested in may be exactly the conditions when your statistical assumptions are most likely to be violated.

To achieve the level of assurance you are asking for may take a very careful analysis of every aspect of the production process to thoroughly understand all the potential causes of intolerance and misalignment.  Otherwise, (or in addition) you may need to collect a lot of data.  Perhaps requiring careful monitoring of millions of cycles of  misalignment and recalibration.
Or, if you can develop a good model for the kinds of process errors should be expected given the nature of the process, you may be able to adequately verify the model over hundreds of cycles.
0

Commented:
"While this is an important step, it is far from adequate."

quite so but the information was designed to emphasis the magnitude of the problem and the additional work needed to solve it.

"Many of the usual calculations based on standard deviation information tend to assume"
(MUST ASSUME)  "a Gaussian distribution. But while a Gaussian distribution can be a good approximation to many real world distributions ..."

shannonEE appears to know what he is talking about and offers a way toward a solution
0

ConsultantAuthor Commented:
Hi All,

This project has been put on hold, so I am going to close this out, and allocate points now.

If or when it comes back, I suspect I will be back with another question, but at least I will be starting further down the path due to your assistance here!

Thanks,

Alan.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.