Avatar of Nick Wolf
Nick WolfFlag for United States of America asked on

Calculating Standard Deviation and Mean: Include zeros?

A sample of 30 items were pulled from a universe of 323.
Each of the 30 rows has an "overpayment" amount. Values range from $0 to $7,000.
16 of 30 rows have an overpayment value > zero.
14 of 30 rows have an "overpayment" value = zero.

I need to calculate the Mean and Standard Deviation of the overpayment amount.
Are rows with a value of zero used in this calculation?

[My calculations done two different ways:]

Mean ($65,235.42 / 30 total items including zeros)                  $2,174.51
Mean Overpayment ($65,235.42 / 16 overpayments)                  $4,077.21
Standard Deviation (of 16 overpayments)                  1546.38
Standard Deviation (of all 30 rows including $0 "overpayments")                  2348.83
Math / ScienceMicrosoft Excel

Avatar of undefined
Last Comment
shadow77

8/22/2022 - Mon
TommySzalapski

This is application dependent. I would say that you would skip the 0 rows since there is no overpayment for those rows. But I would also ask the person who asked you the question which they intended.
ASKER CERTIFIED SOLUTION
TommySzalapski

Log in or sign up to see answer
Become an EE member today7-DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform
Sign up - Free for 7 days
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
See how we're fighting big data
Not exactly the question you had in mind?
Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.
ask a question
SOLUTION
Log in to continue reading
Log In
Sign up - Free for 7 days
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
ASKER
Nick Wolf

Your comments are all helpful, but I am still trying to think this through.
The data I am viewing is a probe/discovery sample of 30 items (insurance claims) from a universe of 323.
Based on established criteria, it was determined by an expert that 16 out of 30 claims were overpaid.
We now need to determine a full sample size. For RAT-STATS I am reading: "Once the mean and standard deviation of the overpayment amount in the Discovery Sample have been calculated, the Full Sample size can be determined."
So, when calculating the mean and standard deviation of the overpayment, I don't know yet whether to include the $0 items.
SOLUTION
Log in to continue reading
Log In
Sign up - Free for 7 days
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
SOLUTION
Log in to continue reading
Log In
Sign up - Free for 7 days
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
TommySzalapski

Reporting the average truncating underpayments as 0 is falsifying data and it would be a legitimate lawsuit if you got sued over it. The average overpayment may well be negative if you are considering all of them. Of course, the bosses are probably trying to save a buck so they most likely only want to see the overpayments, in which case just be sure to not that that's what you are reporting on (like in my previous example)
Your help has saved me hundreds of hours of internet surfing.
fblack61
TommySzalapski

Hah, you must have accepted while I was typing. Oh, well. I didn't really add anything new.
shadow77

Because the zero values represent valid data, you should include them.  They are not missing data.  The actual overpayment amount is zero.  If the overpayment amount were $0.01 or $1, would you include that?

The only reason not to include the zero overpayments would be if you wanted to make some kind of contingent statement ("When a non-zero overpayment occurs, ...").  However, if you wish to assess the performance of  a payment program, the program should clearly get credit when it performs well (ie, overpayment = 0).  Similarly, if assessing performance is a goal, using absolute value in the case of an underpayment makes sense.
TommySzalapski

Zero is not valid. Most of them should be negatives. The goal is to pay the right amout. If your goal was to get 0 for overpayments, you would pay $0 for all claims.
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
shadow77

I don't understand that last statement.

If the overpayment amount is zero, that means exactly the right amount was paid.

If you pay zero, you are underpaying if the approved claim amount is non-zero.
TommySzalapski

No, if the overpayment is 0 the claim could have been underpaid.
shadow77

Are you assuming there are two non-negative fields, overpayment and underpayment, one or both of which would be zero?

I am assuming a single field, overpayment, which (theoretically) might have negative values.  I suppose it might be more accurate to call such a field paymenterror.

The original problem statement only mentioned one field (overpayment) and said there were no negative values.
This is the best money I have ever spent. I cannot not tell you how many times these folks have saved my bacon. I learn so much from the contributors.
rwheeler23
TommySzalapski

Right. That's what my point was. The underpayments were not reported and certainly should not be assumed to be 0. If you used 0 in the calculations, that's what you would be assuming which is why I said it would be false.
ASKER
Nick Wolf

Hi Shadow, thank you for your input.

I think this thread is showing me that I need to talk again with my colleague and make sure I understand our end result. :)
shadow77

I see what you mean now.  If there were a second "underpayment" field, an overpayment of zero could mean either an underpayment or an accurate payment.

However, if there were only an overpayment field (with possible negative values for underpayments), then an overpayment of zero would mean there was no error (and it should be counted), and I would count the zeros.  I would also use absolute values and would label the amount as an "error", rather than as an "overpayment".

Impossible to decide without knowing more about the data and what's required.

Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.