Solved

Calculating Standard Deviation and Mean: Include zeros?

Posted on 2010-11-11
16
2,202 Views
Last Modified: 2012-05-10
A sample of 30 items were pulled from a universe of 323.
Each of the 30 rows has an "overpayment" amount. Values range from $0 to $7,000.
16 of 30 rows have an overpayment value > zero.
14 of 30 rows have an "overpayment" value = zero.

I need to calculate the Mean and Standard Deviation of the overpayment amount.
Are rows with a value of zero used in this calculation?

[My calculations done two different ways:]

Mean ($65,235.42 / 30 total items including zeros)                  $2,174.51
Mean Overpayment ($65,235.42 / 16 overpayments)                  $4,077.21
Standard Deviation (of 16 overpayments)                  1546.38
Standard Deviation (of all 30 rows including $0 "overpayments")                  2348.83
0
Comment
Question by:nicholasjwolf
  • 9
  • 4
  • 2
  • +1
16 Comments
 
LVL 37

Expert Comment

by:TommySzalapski
ID: 34115799
This is application dependent. I would say that you would skip the 0 rows since there is no overpayment for those rows. But I would also ask the person who asked you the question which they intended.
0
 
LVL 37

Accepted Solution

by:
TommySzalapski earned 375 total points
ID: 34115818
If you are trying to find the average wait time at a checkout line, you would include 0s because the people who didn't have to wait at all still checked out. For overpayments though, I would give the percent of overpayments (53.33%) and report the average separately without the 0s.
0
 
LVL 92

Assisted Solution

by:Patrick Matthews
Patrick Matthews earned 125 total points
ID: 34115996
nicholasjwolf,Depends on what you are trying to do, as Tommy indicates.For example, do you want (mean / min / max / st dev / variance) for all payments, or just for those payments that included an overpayment?Indeed, assuming you might have some underpayments out there, then you may be best off basing your stats on the absolute value of the difference between the actual payment received and the expected payment.  For example, you could take the quadratic mean of difference.Admittedly, while I do not know all of your requirements, excluding the payments with no overage seems like a cheat to me.  If you must do this, then you had better start with a larger sample, because drawing any kind of conclusions on a sample size of 16 could be hazardous to your employment status :)Patrick
0
 

Author Comment

by:nicholasjwolf
ID: 34116310
Your comments are all helpful, but I am still trying to think this through.
The data I am viewing is a probe/discovery sample of 30 items (insurance claims) from a universe of 323.
Based on established criteria, it was determined by an expert that 16 out of 30 claims were overpaid.
We now need to determine a full sample size. For RAT-STATS I am reading: "Once the mean and standard deviation of the overpayment amount in the Discovery Sample have been calculated, the Full Sample size can be determined."
So, when calculating the mean and standard deviation of the overpayment, I don't know yet whether to include the $0 items.
0
 
LVL 37

Assisted Solution

by:TommySzalapski
TommySzalapski earned 375 total points
ID: 34116330
Then I would say you really should think about including the underpayments and report the average overpayment but let it be negative if, on average, it is underpaid.
0
 
LVL 37

Assisted Solution

by:TommySzalapski
TommySzalapski earned 375 total points
ID: 34116372
If the underpayments are truncated, then your report should look like this:

53.33% of payments were overpaid. Of those, the average overpayment was $4,077.21 with a standard deviation of $1,546.38
0
 
LVL 37

Expert Comment

by:TommySzalapski
ID: 34116391
Reporting the average truncating underpayments as 0 is falsifying data and it would be a legitimate lawsuit if you got sued over it. The average overpayment may well be negative if you are considering all of them. Of course, the bosses are probably trying to save a buck so they most likely only want to see the overpayments, in which case just be sure to not that that's what you are reporting on (like in my previous example)
0
 
LVL 37

Expert Comment

by:TommySzalapski
ID: 34116393
Hah, you must have accepted while I was typing. Oh, well. I didn't really add anything new.
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 9

Expert Comment

by:shadow77
ID: 34116411
Because the zero values represent valid data, you should include them.  They are not missing data.  The actual overpayment amount is zero.  If the overpayment amount were $0.01 or $1, would you include that?

The only reason not to include the zero overpayments would be if you wanted to make some kind of contingent statement ("When a non-zero overpayment occurs, ...").  However, if you wish to assess the performance of  a payment program, the program should clearly get credit when it performs well (ie, overpayment = 0).  Similarly, if assessing performance is a goal, using absolute value in the case of an underpayment makes sense.
0
 
LVL 37

Expert Comment

by:TommySzalapski
ID: 34116541
Zero is not valid. Most of them should be negatives. The goal is to pay the right amout. If your goal was to get 0 for overpayments, you would pay $0 for all claims.
0
 
LVL 9

Expert Comment

by:shadow77
ID: 34116605
I don't understand that last statement.

If the overpayment amount is zero, that means exactly the right amount was paid.

If you pay zero, you are underpaying if the approved claim amount is non-zero.
0
 
LVL 37

Expert Comment

by:TommySzalapski
ID: 34116619
No, if the overpayment is 0 the claim could have been underpaid.
0
 
LVL 9

Expert Comment

by:shadow77
ID: 34116665
Are you assuming there are two non-negative fields, overpayment and underpayment, one or both of which would be zero?

I am assuming a single field, overpayment, which (theoretically) might have negative values.  I suppose it might be more accurate to call such a field paymenterror.

The original problem statement only mentioned one field (overpayment) and said there were no negative values.
0
 
LVL 37

Expert Comment

by:TommySzalapski
ID: 34116831
Right. That's what my point was. The underpayments were not reported and certainly should not be assumed to be 0. If you used 0 in the calculations, that's what you would be assuming which is why I said it would be false.
0
 

Author Comment

by:nicholasjwolf
ID: 34116889
Hi Shadow, thank you for your input.

I think this thread is showing me that I need to talk again with my colleague and make sure I understand our end result. :)
0
 
LVL 9

Expert Comment

by:shadow77
ID: 34116932
I see what you mean now.  If there were a second "underpayment" field, an overpayment of zero could mean either an underpayment or an accurate payment.

However, if there were only an overpayment field (with possible negative values for underpayments), then an overpayment of zero would mean there was no error (and it should be counted), and I would count the zeros.  I would also use absolute values and would label the amount as an "error", rather than as an "overpayment".

Impossible to decide without knowing more about the data and what's required.

0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Improved? Move/Copy Add-in Replacement - How to avoid the annoying, “A formula or sheet you want to move or copy contains the name XXX, which already exists on the destination worksheet.” David Miller (dlmille)  It was one of those days… I wa…
Freeze panes is an option within all variants of Excel to enable parts of a sheet to remain stationary when the cursor is in another part of the sheet. This is a very useful feature which is overlooked or under used.
The viewer will learn how to use the =DISCRINV command to create a discrete random variable, use this command to model a set of probabilities and outcomes in a Monte Carlo simulation, and learn how to find the standard deviation of a set of probabil…
This Micro Tutorial will demonstrate in Google Sheets how to use the HYPERLINK function to create live links inside your spreadsheet.

910 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now