Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

I have some data I am trying to normalize/ weight. I have 4 regions, the number of people who have missing training certificates, and number of people in the region. Originally I was going to divide number of missing training certificate by number of people for each region to normalize the data. However, the data looks really small when I do that - like 100/40000. I don’t really want to graph such small numbers but need some way to bring in the number of people. Should I multiply by 100 and then just say this data is per 100 employees? Would that make sense?

Any other suggestions?

Any other suggestions?

Experts Exchange Solution brought to you by

Enjoy your complimentary solution view.

Get every solution instantly with premium.
Start your 7-day free trial.

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Should I multiply by 100 and then just say this data is per 100 employees? Would that make sense?Is the data based on only one company or many?

In the first case it is okay, cause you don't have (introduce) further relevant structure.

In the second case, mostly not. Cause it implies that your companies have at least an average of 100 employees. It also raises questions about distribution. How many companies do you have per region? How is the employee average per region?

BUT: you can multiply it by 100 for representing the result in percent or multiply it by 1000 for per mill.

Just my 2¢.

Originally I was going to divide number of missing training certificate by number of people for each region to normalize the data

This is not normalization. Normalization would be the number of missing training certificate per region diveded by the total of the number missing training certificate over all regions.

the number of missing training certificate per region multiplied by the number of people in the region devided by the total number of people all regions.

Missing Certificates/ (ppl in region/sum(all ppl in region))

or

(Missing Certificates/ppl in region) *1000 - this would be missing certificates per 1,000 people

Does this make sense?

sample_data.xlsx

Weights can have any value. The most "natural" weight is the ratio number people own region to sum people all regions. The second "natural" weight is "inverse", the ratio sum of people of other regions to sum people all regions.

The weighted averages are an weighted arithmetic average.

The geometric average is a control number. The geometric average favors small numbers, thus it is useful when we think the error in larger groups is greater than in smaller groups.

sample_data.xlsx

Thank you for the outputs. I am still a little confused. So I was doing the following:

1) #of missing certificates per person which I guess is the ratio. # of missing certs/# of ppl in region. Could I just multiply that by 1,000 to get # of missing certs per 1,000 people in that region. I don't want to report such a small number like .0458.

2) Also, I am confused by the weighted section above. Not understanding by own people and other people. I am trying to figure out the best way to represent the number of missing certificates relative to headcount.

#of missing certificates per person which I guess is the ratio.Not really, but I mean you the correct thing.

So numbers or occurrences or # of something means the direct measured value of something. Like the number of missing certificates or the number of people per region.

I was going to divide number of missing training certificate by number of people for each region [..]This is the ratio in my above sample. More precisely:

Can you multiply it by 100 or 1000? Yes. Cause it simply means expressing the ratio in percent or per mill.

D'oh? .0458 is definitively not a small number.

2) your question was about "and weight Data?"

To weight data, you need to apply a weight. The weight can be any number. Cause the normal arithmetic average is (x1 + .. + xN) / N, while the weighted arithmetic average is (w1 x1 + .. + wN xN) / ( w1 + .. + wN).

The weights I have chosen are the natural ones.

The first is simply weighted by number of people per region (P). Thus the largest region is the most important region in the weighted average.

The second weight is a kind "inverse" of that. It means the number of all other people (1-P). This means the smaller regions are more important.

You would use weights for example: You have only a small number of test personnel. Thus the larger the group is, the larger is the error they make. In this case you would chose weight 2 scenario to indicate that using smaller groups you would have an lesser average.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Statistics

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Experts Exchange Solution brought to you by

Enjoy your complimentary solution view.

Get every solution instantly with premium.
Start your 7-day free trial.

Open in new window