# Best way to normalize and weight Data?

I have some data I am trying to normalize/ weight. I have 4 regions, the number of people who have missing training certificates, and number of people in the region. Originally I was going to divide number of missing training certificate by number of people for each region to normalize the data. However, the data looks really small when I do that - like 100/40000. I don’t really want to graph such small numbers but need some way to bring in the number of people. Should I multiply by 100 and then just say this data is per 100 employees? Would that make sense?

Any other suggestions?
###### Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Technical Specialist/DeveloperCommented:
Is this what you are looking for? (\$A\$1:\$D\$1 is the range of values
=(A1-MIN(\$A\$1:\$D\$1))/(MAX(\$A\$1:\$D\$1)-MIN(\$A\$1:\$D\$1))
0
Senior DeveloperCommented:
First of all: There is no "best way". This would mean using a preferred scale. Which resembles bias.

Should I multiply by 100 and then just say this data is per 100 employees? Would that make sense?
Is the data based on only one company or many?

In the first case it is okay, cause you don't have (introduce) further relevant structure.

In the second case, mostly not. Cause it implies that your companies have at least an average of 100 employees. It also raises questions about distribution. How many companies do you have per region? How is the employee average per region?

BUT: you can multiply it by 100 for representing the result in percent or multiply it by 1000 for per mill.

Just my 2¢.
0
Senior DeveloperCommented:
Originally I was going to divide number of missing training certificate by number of people for each region to normalize the data

This is not normalization. Normalization would be the number of missing training certificate per region diveded by the total of the number  missing training certificate over all regions.
0
Senior DeveloperCommented:
And weighting would mean:

the number of missing training certificate per region multiplied by the number of people in the region devided by the total number of people all regions.
0
Author Commented:
Here is the data I am trying to weight relative to # of people in the region. I was going to do the following formula to employees per region into consideration.

Missing Certificates/ (ppl in region/sum(all ppl in region))

or

(Missing Certificates/ppl in region) *1000 - this would be missing certificates per 1,000 people

Does this make sense?
sample_data.xlsx
0
Senior DeveloperCommented:
I don't see what the first formula should represent. The second is just the (average) ratio per mill. So I think you're looking for the green numbers.

Weights can have any value. The most "natural" weight is the ratio number people own region to sum people all regions. The second "natural" weight is "inverse", the ratio sum of people of other regions to sum people all regions.
The weighted averages are an weighted arithmetic average.
The geometric average is a control number. The geometric average favors small numbers, thus it is useful when we think the error in larger groups is greater than in smaller groups.
sample_data.xlsx
0
Author Commented:
Hi Ste5an-

Thank you for the outputs. I am still a little confused. So I was doing the following:

1) #of missing certificates per person which I guess is the ratio. # of missing certs/# of ppl in region. Could I just multiply that by 1,000 to get # of missing  certs per 1,000 people in that region. I don't want to report such a small number like .0458.

2) Also, I am confused by the weighted section above. Not understanding by own people and other people. I am trying to figure out the best way to represent the number of missing certificates relative to headcount.
0
Senior DeveloperCommented:
1)
#of missing certificates per person which I guess is the ratio.
Not really, but I mean you the correct thing.

So numbers or occurrences or # of something means the direct measured value of something. Like the number of missing certificates or the number of people per region.

I was going to divide number of missing training certificate by number of people for each region [..]
This is the ratio in my above sample. More precisely:
#of missing certificates in region divided by # of people in region

Can you multiply it by 100 or 1000? Yes. Cause it simply means expressing the ratio in percent or per mill.
D'oh?  .0458 is definitively not a small number.

To  weight data, you need to apply a weight. The weight can be any number. Cause the normal arithmetic average is (x1 + .. + xN) / N, while the weighted arithmetic average is  (w1 x1 + .. +  wN xN) / ( w1 + .. + wN).
The weights I have chosen are the natural ones.

The first is simply weighted by number of people per region (P). Thus the largest region is the most important region in the weighted average.
The second weight is a kind "inverse" of that. It means the number of all other people (1-P). This means the smaller regions are more important.

You would use weights for example: You have only a small number of test personnel. Thus the larger the group is, the larger is the error they make. In this case you would chose weight 2 scenario to indicate that using smaller groups you would have an lesser average.
0

Experts Exchange Solution brought to you by

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Author Commented:
This was a great explanation!
0
###### It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Statistics

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.