• Status: Solved
• Priority: Medium
• Security: Public
• Views: 479

# Statistical Confidence in Software Fix

Hi all,

a certain software has executed with problems occurring six times over a 33 day period:

Day 1, 10, 17, 18, 24 and 33. (Longest interval between problem days was 9 days)

On day 34 the software was updated with a version that was believed to fix the problem and the updated version has now run for 10 sequential days without the problem occurring.

What can be say about the probability that the problem has been fixed? What is the confidence level based on days passed without seeing the problem? How can this be calculated? Using which statistical model?

E.g. can one with say that the problem is fixed with x % probability after y days of trouble-free performance?

It can be assumed that no other parameters have changed that could cause the problem and that the occurrences are not dependent on each other.

Thanks!
0
mag99x
• 3
• 2
4 Solutions

Commented:
You can probably model this with a Poisson Distribution although it would be better to have more data.
In the real world, you can't let things stay broken just to get more data.

http://en.wikipedia.org/wiki/Poisson_distribution

There is a formula on the Wikipedia page for 95% confidence limit calculations.
0

Commented:
Crude calculations:

Six faults in 33 days  ==>  One fault every 5.5 days

If the problem were not fixed, the probability of a fault would be 1/5.5 ==> 0.182 every day.

The probability 10 fault free days in row would be (1 -0.182)^10 ==> 0.134.

The problem is probably fixed, but you will be more certain in a few more days.
0

Commented:
Just thoght I'd try to add a bit, not to step on toes or anything.

Crude calculations:

Actually, that's a simple, yet flawless analysis of the data (assuming it's exponential/poisson distributed, which is likely to be at least very close).

The probability 10 fault free days in row would be (1 -0.182)^10 ==> 0.134.
The problem is probably fixed, but you will be more certain in a few more days.

13% is usually considered too high to make a conclusion.
ln(.05)/ln(1-.182) = about 15 error free days gives you a (1-.05) = 95% confidence that the problem is gone.
23 days for 99%
0

Commented:
>>  TommySzalapski

I agree completely.  The analysis has to assume that the faults are the result of a
a single Poisson process.

With so little data, I'm not sure how you can justify the Poisson assumption,
except by necessity.  There is really nothing better.
0

Commented:
With so little data, I'm not sure how you can justify the Poisson assumption

True. However, remember that if a distribution is memoryless (the fact that an error occurred yesterday does not affect the chances that it will happen today), then the errors are exponentially distributed (which is poisson interarrival times). The discrete version is of course the geometric distribution (which technically is what you actually used). This has been proven which is why we always default to assuming exponential/geometric.
0

Commented:
Knowing the nature of the software might help a lot.
I can give you one example that can bring you to 0% of probability.
Say, the days { 10, 17, 18, 24, 33 } are of year-closing of financial books in a big company (I've removed day 1 because it seems it's not the same day of week as others, and a problem might be caused by testing, not production). The same conditions (data size, concurrency) will appear only after 12 monthes.
0

## Featured Post

• 3
• 2
Tackle projects and never again get stuck behind a technical roadblock.