Statistical test to find within class similarity

Zizi used Ask the Experts™

I need your help to find the name of statistical test. I used Welch t-test to find the variance between classes. I used Welch t-test as the size of the classes was different. However, I now need to find within class similarity. Appreciate, if you could please advise on the name of the test.

Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Does correlation coefficient fit your needs?


Sorry for the late response. Would this work for  with unequal sample size/variance? Thanks
Sorry, the sample sizes are the same.
Rowby Goren Makes an Impact on Screen and Online

Learn about longtime user Rowby Goren and his great contributions to the site. We explore his method for posing questions that are likely to yield a solution, and take a look at how his career transformed from a Hollywood writer to a website entrepreneur.

I now need to find within class similarity
If this is the objective then what does "unequal sample size/variance" mean?
And, what sort of "similarity" measure is needed?
Collect 200  apples from one field  and weigh each one to get the mean weight and weight variance. Do the same for another field except you collected only 150 apples.

For the two collections, the sample size, the mean weight, and the weight variance are all different.


Thanks @phoffric and @Fred

@phoffric, would this by any chance help to reduce the variance, it seems to be yes ...  some normalisation is applied to the data so that it can be used ... I found this from:



Using your example, is it possible, if I do a correlation between 150 apples from 200 apples collected from field 1 and then, with 150 apples from field 2. Do this test twice, where round 2, I take 150 apples from field 1 by ensuring the previously 50 apples that was not tested in round 1, is taken and do, it 150 apples from field 2.. Is this statistically accepted? Thanks!
>> need to find within class similarity.
Could you clarify or expand upon this?

>> to reduce the variance

One way to reduce the variance of a random variable is to remove the outliers. For example, if your model is a gaussian distribution, then you pick a threshold factor, and discard all points in excess of standard deviation times the threshold factor, and keep repeating the calculation of a new mean and variance until their changes are below your desired thresholds.

>> Do this test twice, where round 2, I take 150 apples from field 1 by ensuring the previously 50 apples that was not tested in round 1.

This is close to a standard requirement to verify that cluster analysis is accurate. But you are expected to perform many tests by randomly permuting your input data and then make the selection as you described, to confirm that all the results are close. An inadequate separation of two clusters may be discovered by repeating the random permutation sample sets many times.

>> some normalisation is applied to the data

This normalization is part of the definition and is required to keep the correlation coefficient in the interval -1..+1. In this way the units of the data can be different causing one kind of units to dominate the results.

A +1.0 value means that as one random variable increases in value, then the other random variable is also increasing. It doesn't mean that they have the same mean and variance.

How did you  Identify the two classes ( clusters)?

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial