Editor's Choice: This article has been selected by our editors as an exceptional contribution.

# Calculating Special Means in Microsoft Access Part 1: Weighted Average and Harmonic Mean

Published:
Updated:

## Introduction

One of the leading measures of central tendency is the average, or arithmetic mean.  Microsoft Access makes it easy to calculate the average for a data set by using the Avg aggregate function.  However, the arithmetic mean is not the only type of mean that exists, and for certain analyses, other types of means may be more appropriate.

This article will demonstrate how to calculate several two other useful, yet less-commonly used means, using Microsoft Access:

Part 2 of this article will extend the discussion to include the following other types of means:

Note: the SQL statements required for these “special” kinds of means are somewhat complex, but should be within the grasp of any intermediate or advanced Access user.  For that reason, I have not provided VBA versions of these functions.  Further, by relying solely on native Access functions, your application will exhibit faster performance.

## Sample File

For examples of all of the cases discussed below, please refer to the attached sample file:
Special-Means-Part-1.mdb
This file contains all of the data and query definitions used in the four numbered examples in this article, and you may find the file useful for extending these examples or creating your own new examples.

## Weighted Average

A weighted average, sometimes called the weighted mean, is similar to the arithmetic mean.  However, while with the arithmetic mean all members of the data set are given equal weight, in a weighted average the various members of the data set may have different weights, and thus have unequal influence in the result.

To calculate a weighted average for a data set {x1, x2, x3, …, x n} with weights {w1, w2, w3, …, w n}, sum the products of the value of each data set member and its weight, and divide by the sum of the weights:

None of the weights can be negative.  Further, while zero weights are allowed, the data set must include at least one non-zero weight.

Translating this into a generic SQL statement yields:

``````SELECT {group by columns if desired,} IIf(Min([WeightColumn]) >=0 And
Max([WeightColumn]) > 0, Sum([WeightColumn] * [ValueColumn]), Null) /
Sum([WeightColumn]) AS WtdAvg
FROM {tables}
{GROUP BY {group by columns}}
``````

Note: The denominator in this particular aggregate expression does not need its own IIf expression to protect against the possibility of having the sum of weights being zero (and thus causing an division-by-zero error).  If that is the case, the numerator will already be null, and since any arithmetic operation in which one of the operands is null always results in null, Access will not attempt to perform the division.
A weighted average is often used for computing an “average of averages”.  For example, consider the following sample results:

``````Sample  Count    Mean
---------------------
A         100   425.7
B         200   398.6
C         300   379.1
``````
If the three samples were combined, the overall mean would not be the average of the three sample means: Since the three samples have an unequal number of members, the samples with more members should be given greater weight.  Thus, the weighted average is:

``````Sample  Count    Mean   Contribution
--------------------------------------------------
A         100   425.7   100 * 425.7 / 600 =  70.95
B         200   398.6   200 * 398.8 / 600 = 132.93
C         300   379.1   300 * 379.1 / 600 = 189.55
--------------------------------------------------
Result:                                     393.43
``````

Example 1: qryWeightedAvg_RegionDistrict
In the attached sample file, please refer to the query qryWeightedAvg_RegionDistr ict.

In this example, a company has conducted a series of customer surveys across its regions and districts.  The table tblWeightedAverage holds the results of each survey, noting the Region and District that participated, the number of customers surveyed (Weight), and the average score.  Our company now wishes to aggregate the survey results.

Since this is, in effect, an “average of averages” for which the number of data elements can vary, a weighted average is the only way to aggregate the results correctly.  Implementing the formula above yields the following SQL statements:

``````By Region:

SELECT Region, IIf(Min([Weight]) >= 0 And Max([Weight]) > 0,
Sum([Weight] * [Value]), Null) / Sum([Weight]) AS WeightedAvg
FROM tblWeightedAverage
GROUP BY Region
ORDER BY Region;
``````
``````By District:

SELECT District, IIf(Min([Weight]) >= 0 And Max([Weight]) > 0,
Sum([Weight] * [Value]), Null) / Sum([Weight]) AS WeightedAvg
FROM tblWeightedAverage
GROUP BY District
ORDER BY District;
``````
``````By Region and District:

SELECT Region, District, IIf(Min([Weight]) >= 0 And Max([Weight]) > 0,
Sum([Weight] * [Value]), Null) / Sum([Weight]) AS WeightedAvg
FROM tblWeightedAverage
GROUP BY Region, District
ORDER BY Region, District;
``````

Those queries return the following results:

Instead of a straight aggregation, I used a conditional aggregation.  The conditions are set up to reject any data sets with negative weights, or data sets with no positive weights.  If a data set violates either or both of these conditions, the result is Null.
If there is a Null weight for any member of a data set, that member has no impact on the resulting weighted average.
If the value is null but the weight is not null, the aggregation is effectively treating that value as zero: that member does not contribute to the numerator, and yet its weight still contributes to the denominator.

In this data set, I specifically set up a few items to test these conditions:

Region A, District 1 has an item with Weight = -2.  Since weights cannot be negative, this makes the results for that region/district Null.
Region B, District 3 has an item with a Null weight.  That causes the corresponding value (97.4) to be ignored in the weighted average.
Region C, District 5 has an item with a Null Value (but non-null Weight).  This item is included in the results, being treated as if the Value were zero.
Both items for Region E, District 9 have zero Weights, thus causing the result to be Null.

In this example, students’ grades for a particular course are a function of their grades on the four quizzes, two papers, midterm exam, and final exam.  Since the instructor places varying degrees of emphasis on these different items, each has a different weight:

Each of the four quizzes counts for 5% of the grade.
Each of the two papers counts for 10% of the grade.
The midterm exam counts for 20% of the grade.
The final exam counts for the remaining 40% of the grade.
Assuming that each item is graded on a 100-point scale, each student’s final grade will be a weighted average of his/her individual scores.

This example constitutes a special case of the weighted average: if the sum of weights is equal to 1, then the weighted average is simply the sum of the products of the individual values and their weights.  This simplifies the SQL statement, as it is no longer necessary to include the sum of the weights as the denominator in the aggregate expression.  (Division by one always returns the dividend.)

Thus, to determine our students’ grades, run the following query:

``````SELECT s.Lname, s.Fname, s.Mname, IIf(Min(e.Weight) >= 0 And Max(e.Weight) > 0,
Sum(e.Weight * m.Score), Null) AS Grade
FROM (tblExams AS e INNER JOIN
tblStudentMarks AS m ON e.ID = m.ExamID) INNER JOIN
tblStudents AS s ON m.StudentID = s.ID
GROUP BY s.Lname, s.Fname, s.Mname;
``````
That query returns the following results:

As in Example 1, we test the weights to ensure that there are no negative weights, and that there is at least one positive weight.
If any student has a “missing” grade in tblStudentMarks, the effect is the same as if s/he received a zero for that assignment.
The SQL statement above assumes that the sum of the weights is in fact one.  If for any reason that is not actually the case, then the calculated weighted average will be incorrect.

## Harmonic Mean

The harmonic mean is mainly used for computing an average of rates or ratios.  Mathematically, the harmonic mean of a data set {x1, x2, x3, …, x n} is the reciprocal of the average of the reciprocals of the items in the data set:

You cannot calculate the harmonic mean if the data set includes any zero or negative values, and thus any SQL statement used to calculate the harmonic mean must test to ensure that only positive numbers are included in the data set, and also must guard against the possibility of a division by zero error.

Translating this into a generic SQL statement yields:

``````SELECT {group by columns if desired,} IIf(Min([ValueColumn]) > 0,
1 / Avg(1 / IIf([ValueColumn] <> 0, [ValueColumn], Null)), Null) AS HarmMean
FROM {tables}
{GROUP BY {group by columns}}
``````
Note: The denominator in the Avg expression must have its own IIf expression to avoid any possible zero value in the ValueColumn.  This is because Access always evaluates both the “if true” and “if false” parts of an IIf expression.  If we leave out that embedded IIf expression and the data set does include a zero value, Access will be unable to run the query and will return a division by zero error message.
As long as there is at least one pair of unequal values in the data set, the harmonic mean will always be less than the arithmetic mean.

A classic example for the harmonic mean involves averaging speeds over a given distance.  For example, if a person drives for one hour at 100 km/hr, and then for one hour at 50 km/hr, the average speed for the trip is the arithmetic mean of the two speeds, or 75 km/hr.

However, if instead we say that we drove for 100 km at 100 km/hr, and then drove for another 100 km at 50 km/hr, the average speed for the trip will be the harmonic mean:

``````Harmonic Mean = 1 / ([1/100 + 1/50] / 2) = 66.67 km/hr

Validation:

Time to travel 1st 100 km: 100km / 100km/hr = 1 hr
Time to travel 2nd 100 km: 100km / 50km/hr =  2 hr
Total travel time: 3 hr
Total distance: 200 km
Average speed = 200km / 3 hr = 66.67 km/hr
``````

Example 3: qryHarmMean
In the attached sample file, please refer to the query qryHarmMean.

In this example, two drivers have both completed ten circuits of uniform length over a particular course.  For each circuit, the drivers’ average speeds are recorded in the table tblHarmMean.  We now wish to compute each driver’s overall average speed across all circuits.

In this scenario, the drivers’ average speeds will be the harmonic means of their speeds in each circuit.  The appropriate SQL statement is:

``````SELECT Driver, IIf(Min([Speed]) > 0, 1 / Avg(1 / IIf([Speed] <> 0, [Speed], Null)), Null) AS HarmMean
FROM tblHarmMean
GROUP BY Driver;
``````

That query returns the following results:

Driver B’s effective average speed cannot be calculated: the average speed for one of Driver B’s circuits is listed as zero.  The SQL statement above will return a Null result for any data set for which any item is less than or equal to zero, and it will ignore null values altogether.

## Weighted Harmonic Mean

In the section above, we saw how to take an un-weighted harmonic mean.  You may also calculate a weighted harmonic mean.  For a data set with values {x1, x2, x3, …, x n} and weights {w1, w2, w3, …, w n}, the weighted harmonic mean is:

As in the simple harmonic mean, the values must all be greater than zero.  Also, as with the weighted average, the weights cannot be negative, and there must be at least one non-zero weight.  Any SQL statement used for calculating the weighted harmonic mean must test for those conditions.

Translating this into a generic SQL statement yields:

``````SELECT {group by columns if desired,} IIf(Min([ValueColumn]) > 0 And
Min([WeightColumn]) >=0 And Max([WeightColumn]) > 0,
Sum([WeightColumn]) / Sum([WeightColumn] /
IIf([ValueColumn] = 0, Null, [ValueColumn])), Null) AS WtdHarmMean
FROM {tables}
{GROUP BY {group by columns}}
``````

Note: As with the weighted average, we must test for having no negative weights, and at least one positive weight.  As with the unweighted harmonic mean, the denominator in the Avg expression must have its own IIf expression to escape any possible zero value in the ValueColumn, which would create a division by zero error.
For example, in finance, a common measure of the relative value of a company’s stock is the price to earnings ratio, or PE. A relatively low PE indicates either a relatively under-valued stock or a company expected to have low or even negative future earnings growth, while a relatively high PE indicates either an over-valued stock or a company expected to have robust future earnings growth.

Consider the following companies, for which we want to determine the aggregate PE:

``````Company     PE Ratio
--------------------
A              12.50
B              14.29
C              20.00
D              50.00
``````
If the companies have similar market values, then the “average” PE would be the harmonic means of the individual PE ratios.  However, if the companies’ market value are not similar, this would not be appropriate.  Suppose that the companies’ market values were as follows:

``````Company   Market Capitalization   PE Ratio
------------------------------------------
A                \$5,000,000,000      12.50
B                \$2,000,000,000      14.29
C                  \$200,000,000      20.00
D                   \$20,000,000      50.00
``````
Company A comprises almost 70% of the total market value of the four companies, and as such its PE should be accorded greater weight than the others’.  Thus, we should use the weighted harmonic mean.

Normalizing the weights by dividing each by 20,000,000 results in:

``````Company   Weight   PE Ratio
---------------------------
A            250      12.50
B            100      14.29
C             10      20.00
D              1      50.00
``````
Thus, the weighted harmonic mean is:
``````Wtd Harm Mean = (250 + 100 + 10 + 1) / (250/12.5 + 100/14.29 + 10/20 + 1/50) =
361 / (20 + 7 + 0.5 + 0.02) = 13.12
``````

Example 4: qryHarmMean_Weighted
In the attached sample file, please refer to the query qryHarmMean_Weighted.

This example is similar to Example 3, except that this time the circuits recorded for each driver are not necessarily all the same length.  For example, suppose that the circuit lengths and average speeds are as follows:

In this case, we cannot simply take the harmonic means of each driver’s speeds.  Instead, we must find the weighted harmonic means, using the distance covered for each circuit as the weight.  To do this, use the following SQL statement:

``````SELECT Driver, IIf(Min([Speed]) > 0 And Min([Distance]) >=0 And Max([Distance]) > 0,
Sum([Distance]) / Sum([Distance] / IIf([Speed] = 0, Null, [Speed])), Null) AS WtdHarmMean
FROM tblHarmMeanWeighted
GROUP BY Driver;
``````
That query returns the following results:

## To Be Continued…

Please be sure to continue on to Part 2 of this article, which will extend the discussion to include the following other types of means, and how to calculate them using Microsoft Access:

In addition, if you are interested in other measures of central tendency, please refer to this article that discusses calculating the median and mode in Microsoft Access.

=-=-=-=-=-=-=-=-=-=-=-=-=- =-=-=-=-=- =-=-=-=-=- =-=-=-=-=- =-=-=-=-=- =-=-=-=-=- =-=-=