Browse All Articles > Calculating Special Means in Microsoft Access Part 1: Weighted Average and Harmonic Mean

This article will demonstrate how to calculate several two other useful, yet less-commonly used means, using Microsoft Access:

Special-Means-Part-1.mdb

This file contains all of the data and query definitions used in the four numbered examples in this article, and you may find the file useful for extending these examples or creating your own new examples.

To calculate a weighted average for a data set {x1, x2, x3, …, x

None of the weights can be negative. Further, while zero weights are allowed, the data set must include at least one non-zero weight.

Translating this into a generic SQL statement yields:

```
SELECT {group by columns if desired,} IIf(Min([WeightColumn]) >=0 And
Max([WeightColumn]) > 0, Sum([WeightColumn] * [ValueColumn]), Null) /
Sum([WeightColumn]) AS WtdAvg
FROM {tables}
{GROUP BY {group by columns}}
```

```
Sample Count Mean
---------------------
A 100 425.7
B 200 398.6
C 300 379.1
```

If the three samples were combined, the overall mean would ```
Sample Count Mean Contribution
--------------------------------------------------
A 100 425.7 100 * 425.7 / 600 = 70.95
B 200 398.6 200 * 398.8 / 600 = 132.93
C 300 379.1 300 * 379.1 / 600 = 189.55
--------------------------------------------------
Result: 393.43
```

In this example, a company has conducted a series of customer surveys across its regions and districts. The table tblWeightedAverage holds the results of each survey, noting the Region and District that participated, the number of customers surveyed (Weight), and the average score. Our company now wishes to aggregate the survey results.

Since this is, in effect, an “average of averages” for which the number of data elements can vary, a weighted average is the only way to aggregate the results correctly. Implementing the formula above yields the following SQL statements:

```
By Region:
SELECT Region, IIf(Min([Weight]) >= 0 And Max([Weight]) > 0,
Sum([Weight] * [Value]), Null) / Sum([Weight]) AS WeightedAvg
FROM tblWeightedAverage
GROUP BY Region
ORDER BY Region;
```

```
By District:
SELECT District, IIf(Min([Weight]) >= 0 And Max([Weight]) > 0,
Sum([Weight] * [Value]), Null) / Sum([Weight]) AS WeightedAvg
FROM tblWeightedAverage
GROUP BY District
ORDER BY District;
```

```
By Region and District:
SELECT Region, District, IIf(Min([Weight]) >= 0 And Max([Weight]) > 0,
Sum([Weight] * [Value]), Null) / Sum([Weight]) AS WeightedAvg
FROM tblWeightedAverage
GROUP BY Region, District
ORDER BY Region, District;
```

Those queries return the following results:

Please note the following about this aggregation:

Instead of a straight aggregation, I used a conditional aggregation. The conditions are set up to reject any data sets with negative weights, or data sets with no positive weights. If a data set violates either or both of these conditions, the result is Null.

If there is a Null weight for any member of a data set, that member has no impact on the resulting weighted average.

If the value is null but the weight is not null, the aggregation is effectively treating that value as zero: that member does not contribute to the numerator, and yet its weight still contributes to the denominator.

In this data set, I specifically set up a few items to test these conditions:

Region A, District 1 has an item with Weight = -2. Since weights cannot be negative, this makes the results for that region/district Null.

Region B, District 3 has an item with a Null weight. That causes the corresponding value (97.4) to be ignored in the weighted average.

Region C, District 5 has an item with a Null Value (but non-null Weight). This item is included in the results, being treated as if the Value were zero.

Both items for Region E, District 9 have zero Weights, thus causing the result to be Null.

In this example, students’ grades for a particular course are a function of their grades on the four quizzes, two papers, midterm exam, and final exam. Since the instructor places varying degrees of emphasis on these different items, each has a different weight:

Each of the four quizzes counts for 5% of the grade.

Each of the two papers counts for 10% of the grade.

The midterm exam counts for 20% of the grade.

The final exam counts for the remaining 40% of the grade.

Assuming that each item is graded on a 100-point scale, each student’s final grade will be a weighted average of his/her individual scores.
This example constitutes a special case of the weighted average: if the sum of weights is equal to 1, then the weighted average is simply the sum of the products of the individual values and their weights. This simplifies the SQL statement, as it is no longer necessary to include the sum of the weights as the denominator in the aggregate expression. (Division by one always returns the dividend.)

Thus, to determine our students’ grades, run the following query:

```
SELECT s.Lname, s.Fname, s.Mname, IIf(Min(e.Weight) >= 0 And Max(e.Weight) > 0,
Sum(e.Weight * m.Score), Null) AS Grade
FROM (tblExams AS e INNER JOIN
tblStudentMarks AS m ON e.ID = m.ExamID) INNER JOIN
tblStudents AS s ON m.StudentID = s.ID
GROUP BY s.Lname, s.Fname, s.Mname;
```

That query returns the following results:
Please note the following:

As in Example 1, we test the weights to ensure that there are no negative weights, and that there is at least one positive weight.

If any student has a “missing” grade in tblStudentMarks, the effect is the same as if s/he received a zero for that assignment.

The SQL statement above assumes that the sum of the weights is in fact one. If for any reason that is not actually the case, then the calculated weighted average will be incorrect.

You cannot calculate the harmonic mean if the data set includes any zero or negative values, and thus any SQL statement used to calculate the harmonic mean must test to ensure that only positive numbers are included in the data set, and also must guard against the possibility of a division by zero error.

Translating this into a generic SQL statement yields:

```
SELECT {group by columns if desired,} IIf(Min([ValueColumn]) > 0,
1 / Avg(1 / IIf([ValueColumn] <> 0, [ValueColumn], Null)), Null) AS HarmMean
FROM {tables}
{GROUP BY {group by columns}}
```

A classic example for the harmonic mean involves averaging speeds over a given distance. For example, if a person drives for one hour at 100 km/hr, and then for one hour at 50 km/hr, the average speed for the trip is the arithmetic mean of the two speeds, or 75 km/hr.

However, if instead we say that we drove for 100 km at 100 km/hr, and then drove for another 100 km at 50 km/hr, the average speed for the trip will be the harmonic mean:

```
Harmonic Mean = 1 / ([1/100 + 1/50] / 2) = 66.67 km/hr
Validation:
Time to travel 1st 100 km: 100km / 100km/hr = 1 hr
Time to travel 2nd 100 km: 100km / 50km/hr = 2 hr
Total travel time: 3 hr
Total distance: 200 km
Average speed = 200km / 3 hr = 66.67 km/hr
```

In this example, two drivers have both completed ten circuits of uniform length over a particular course. For each circuit, the drivers’ average speeds are recorded in the table tblHarmMean. We now wish to compute each driver’s overall average speed across all circuits.

In this scenario, the drivers’ average speeds will be the harmonic means of their speeds in each circuit. The appropriate SQL statement is:

```
SELECT Driver, IIf(Min([Speed]) > 0, 1 / Avg(1 / IIf([Speed] <> 0, [Speed], Null)), Null) AS HarmMean
FROM tblHarmMean
GROUP BY Driver;
```

That query returns the following results:

As in the simple harmonic mean, the values must all be greater than zero. Also, as with the weighted average, the weights cannot be negative, and there must be at least one non-zero weight. Any SQL statement used for calculating the weighted harmonic mean must test for those conditions.

Translating this into a generic SQL statement yields:

```
SELECT {group by columns if desired,} IIf(Min([ValueColumn]) > 0 And
Min([WeightColumn]) >=0 And Max([WeightColumn]) > 0,
Sum([WeightColumn]) / Sum([WeightColumn] /
IIf([ValueColumn] = 0, Null, [ValueColumn])), Null) AS WtdHarmMean
FROM {tables}
{GROUP BY {group by columns}}
```

Consider the following companies, for which we want to determine the aggregate PE:

```
Company PE Ratio
--------------------
A 12.50
B 14.29
C 20.00
D 50.00
```

If the companies have similar market values, then the “average” PE would be the harmonic means of the individual PE ratios. However, if the companies’ market value are not similar, this would not be appropriate. Suppose that the companies’ market values were as follows:
```
Company Market Capitalization PE Ratio
------------------------------------------
A $5,000,000,000 12.50
B $2,000,000,000 14.29
C $200,000,000 20.00
D $20,000,000 50.00
```

Company A comprises almost 70% of the total market value of the four companies, and as such its PE should be accorded greater weight than the others’. Thus, we should use the weighted harmonic mean.
Normalizing the weights by dividing each by 20,000,000 results in:

```
Company Weight PE Ratio
---------------------------
A 250 12.50
B 100 14.29
C 10 20.00
D 1 50.00
```

Thus, the weighted harmonic mean is:
```
Wtd Harm Mean = (250 + 100 + 10 + 1) / (250/12.5 + 100/14.29 + 10/20 + 1/50) =
361 / (20 + 7 + 0.5 + 0.02) = 13.12
```

This example is similar to

In this case, we cannot simply take the harmonic means of each driver’s speeds. Instead, we must find the weighted harmonic means, using the distance covered for each circuit as the weight. To do this, use the following SQL statement:

```
SELECT Driver, IIf(Min([Speed]) > 0 And Min([Distance]) >=0 And Max([Distance]) > 0,
Sum([Distance]) / Sum([Distance] / IIf([Speed] = 0, Null, [Speed])), Null) AS WtdHarmMean
FROM tblHarmMeanWeighted
GROUP BY Driver;
```

That query returns the following results:
In addition, if you are interested in other measures of central tendency, please refer to this article that discusses calculating the median and mode in Microsoft Access.

=-=-=-=-=-=-=-=-=-=-=-=-=-

Was this article helpful?

label that is just below and to the right of this text.

=-=-=-=-=-=-=-=-=-=-=-=-=-

## Comments (0)