Community Pick: Many members of our community have endorsed this article.
Editor's Choice: This article has been selected by our editors as an exceptional contribution.

Calculating Special Means in Microsoft SQL Server Part 2: Geometric Mean and Quadratic Mean

Patrick Matthews
CERTIFIED EXPERT
Published:
Updated:

Introduction



While the average, or arithmetic mean, is probably the most commonly used measure of central tendency, it is certainly not the only such statistic.

In my article Calculating Special Means in Microsoft SQL Server Part 1, I discussed how to calculate the weighted average and harmonic mean.  This article will demonstrate how to calculate several two other useful, yet less commonly used means, using Microsoft SQL Server:


Indeed, all four of those means, as well as the standard arithmetic mean, minimum, and maximum, are all specific cases of the generalized power mean.

Note: Most of the material in this article originally appeared in one of my earlier articles, Calculating Special Means in Microsoft Access Part 2: Geometric Mean and Quadratic Mean .  For this article, I have adapted the text, examples, and SQL code to reflect proper Transact-SQL syntax.


Sample Data



To facilitate the examples in this article, I used the following T-SQL batch to create and populate three tables:

tblGeomMean
tblWeightedGeomMean
tblQuadMean

If you wish to follow along with the examples on your own, please execute the batch in SQL Query Analyzer (SQL Server 2000), or in the SQL Server Management Studio (SQL Server 2005 or later).

/* This script creates and populates the tables needed for the examples in the Special
                      Means Part 2 article */
                      
                      CREATE TABLE tblGeomMean (ID int IDENTITY(1, 1), Investment varchar(20), Period int, 
                          AnnualReturn decimal(9, 6))
                      
                      INSERT INTO tblGeomMean (Investment, Period, AnnualReturn) VALUES ('A', 1, 0.0619)
                      INSERT INTO tblGeomMean (Investment, Period, AnnualReturn) VALUES ('A', 2, 0.0468)
                      INSERT INTO tblGeomMean (Investment, Period, AnnualReturn) VALUES ('A', 3, -0.0283)
                      INSERT INTO tblGeomMean (Investment, Period, AnnualReturn) VALUES ('A', 4, -0.1096)
                      INSERT INTO tblGeomMean (Investment, Period, AnnualReturn) VALUES ('A', 5, 0.0212)
                      INSERT INTO tblGeomMean (Investment, Period, AnnualReturn) VALUES ('A', 6, -0.0721)
                      INSERT INTO tblGeomMean (Investment, Period, AnnualReturn) VALUES ('A', 7, -0.0443)
                      INSERT INTO tblGeomMean (Investment, Period, AnnualReturn) VALUES ('A', 8, 0.0117)
                      INSERT INTO tblGeomMean (Investment, Period, AnnualReturn) VALUES ('A', 9, 0.0723)
                      INSERT INTO tblGeomMean (Investment, Period, AnnualReturn) VALUES ('A', 10, 0.0166)
                      INSERT INTO tblGeomMean (Investment, Period, AnnualReturn) VALUES ('B', 1, 0.0342)
                      INSERT INTO tblGeomMean (Investment, Period, AnnualReturn) VALUES ('B', 2, 0.0721)
                      INSERT INTO tblGeomMean (Investment, Period, AnnualReturn) VALUES ('B', 3, -0.03)
                      INSERT INTO tblGeomMean (Investment, Period, AnnualReturn) VALUES ('B', 4, 0.076)
                      INSERT INTO tblGeomMean (Investment, Period, AnnualReturn) VALUES ('B', 5, 0.0612)
                      INSERT INTO tblGeomMean (Investment, Period, AnnualReturn) VALUES ('B', 6, 0.0742)
                      INSERT INTO tblGeomMean (Investment, Period, AnnualReturn) VALUES ('B', 7, 0.0175)
                      INSERT INTO tblGeomMean (Investment, Period, AnnualReturn) VALUES ('B', 8, 0.0959)
                      INSERT INTO tblGeomMean (Investment, Period, AnnualReturn) VALUES ('B', 9, 0.097)
                      INSERT INTO tblGeomMean (Investment, Period, AnnualReturn) VALUES ('B', 10, 0.0039)
                      
                      CREATE TABLE tblWeightedGeomMean (ID int IDENTITY(1, 1), Investment varchar(20), Segment int, 
                          Years int, AnnualReturn decimal(9, 6))
                      
                      INSERT INTO tblWeightedGeomMean (Investment, Segment, Years, AnnualReturn) VALUES ('A', 1, 1, 0.2382)
                      INSERT INTO tblWeightedGeomMean (Investment, Segment, Years, AnnualReturn) VALUES ('A', 2, 2, -0.0946)
                      INSERT INTO tblWeightedGeomMean (Investment, Segment, Years, AnnualReturn) VALUES ('A', 3, 3, 0.281)
                      INSERT INTO tblWeightedGeomMean (Investment, Segment, Years, AnnualReturn) VALUES ('A', 4, 4, -0.1071)
                      INSERT INTO tblWeightedGeomMean (Investment, Segment, Years, AnnualReturn) VALUES ('B', 1, 2, 0.1327)
                      INSERT INTO tblWeightedGeomMean (Investment, Segment, Years, AnnualReturn) VALUES ('B', 2, 5, 0.0121)
                      INSERT INTO tblWeightedGeomMean (Investment, Segment, Years, AnnualReturn) VALUES ('B', 3, 3, -0.3234)
                      INSERT INTO tblWeightedGeomMean (Investment, Segment, Years, AnnualReturn) VALUES ('B', 4, 4, -0.0773)
                      INSERT INTO tblWeightedGeomMean (Investment, Segment, Years, AnnualReturn) VALUES ('B', 5, 3, 0.1484)
                      INSERT INTO tblWeightedGeomMean (Investment, Segment, Years, AnnualReturn) VALUES ('B', 6, 2, -0.1648)
                      INSERT INTO tblWeightedGeomMean (Investment, Segment, Years, AnnualReturn) VALUES ('B', 7, 2, 0.0149)
                      INSERT INTO tblWeightedGeomMean (Investment, Segment, Years, AnnualReturn) VALUES ('B', 8, 4, 0.0458)
                      INSERT INTO tblWeightedGeomMean (Investment, Segment, Years, AnnualReturn) VALUES ('B', 9, 5, 0.0932)
                      INSERT INTO tblWeightedGeomMean (Investment, Segment, Years, AnnualReturn) VALUES ('B', 10, 2, 0.0031)
                      
                      CREATE TABLE tblQuadMean (ID int IDENTITY(1, 1), Series varchar(20), Point int, Theory decimal(12, 6), 
                          Observed decimal(12, 6))
                      
                      INSERT INTO tblQuadMean (Series, Point, Theory, Observed) VALUES ('A', 1, 11, -3)
                      INSERT INTO tblQuadMean (Series, Point, Theory, Observed) VALUES ('A', 2, 16, 19)
                      INSERT INTO tblQuadMean (Series, Point, Theory, Observed) VALUES ('A', 3, 19, 18)
                      INSERT INTO tblQuadMean (Series, Point, Theory, Observed) VALUES ('A', 4, 20, 21)
                      INSERT INTO tblQuadMean (Series, Point, Theory, Observed) VALUES ('A', 5, 19, 14)
                      INSERT INTO tblQuadMean (Series, Point, Theory, Observed) VALUES ('A', 6, 16, -4)
                      INSERT INTO tblQuadMean (Series, Point, Theory, Observed) VALUES ('A', 7, 11, 17)
                      INSERT INTO tblQuadMean (Series, Point, Theory, Observed) VALUES ('A', 8, 4, 13)
                      INSERT INTO tblQuadMean (Series, Point, Theory, Observed) VALUES ('A', 9, -5, 0)
                      INSERT INTO tblQuadMean (Series, Point, Theory, Observed) VALUES ('A', 10, -16, -19)
                      INSERT INTO tblQuadMean (Series, Point, Theory, Observed) VALUES ('B', 1, 3, 5)
                      INSERT INTO tblQuadMean (Series, Point, Theory, Observed) VALUES ('B', 2, 4, 15)
                      INSERT INTO tblQuadMean (Series, Point, Theory, Observed) VALUES ('B', 3, 7, 18)
                      INSERT INTO tblQuadMean (Series, Point, Theory, Observed) VALUES ('B', 4, 12, 18)
                      INSERT INTO tblQuadMean (Series, Point, Theory, Observed) VALUES ('B', 5, 19, 17)
                      INSERT INTO tblQuadMean (Series, Point, Theory, Observed) VALUES ('B', 6, 28, 29)
                      INSERT INTO tblQuadMean (Series, Point, Theory, Observed) VALUES ('B', 7, 39, 32)
                      INSERT INTO tblQuadMean (Series, Point, Theory, Observed) VALUES ('B', 8, 52, 47)
                      INSERT INTO tblQuadMean (Series, Point, Theory, Observed) VALUES ('B', 9, 67, 65)
                      INSERT INTO tblQuadMean (Series, Point, Theory, Observed) VALUES ('B', 10, 84, 94) 

Open in new window



Geometric Mean



The geometric mean is typically used to determine the “average” factor when a series of numbers are to be multiplied together, or in cases of exponential growth or decay.  To calculate the geometric mean, take the Nth root of the product obtained by using each member of a data set as a factor:

Geometric Mean
The geometric mean can only be calculated when all members of the data set are positive.

For example, suppose you have a rectangular prism, with length = 20, width = 10, and height = 5.  The volume for that shape is the product of the three dimensions: 20 x 10 x 5 = 1000.  Now suppose that you want to know the dimensions of a cube having the same volume.  Since the dimensions of a cube are, by definition, equal, then the dimensions are equal to the cube root of 1000, or 10.  Thus, 10 is also the geometric mean of the data set {20, 10, 5}.

The simplest way to calculate geometric means in Microsoft SQL Server is by leveraging a property of logarithms: the logarithm of the product of any set of positive factors is equal to the sum of each factor.  Thus, we can re-write the basic formula for determining the geometric mean in terms of natural logarithms and the special number e, as SQL Server provides built-in functions, LOG and EXP:

Geometric Mean
Translating this into a generic SQL statement yields:

SELECT {group by columns if desired,} CASE
                          WHEN MIN([ValueColumn]) > 0 THEN EXP(AVG(LOG(
                              CASE WHEN [ValueColumn] > 0 THEN [ValueColumn] ELSE 1 END)))
                          ELSE NULL END AS GeomMean
                      FROM {tables}
                      {GROUP BY {group by columns}}

Open in new window


Note: This generic SQL statement uses CASE expressions to handle the constraint that all of the values in the data set must either be null or be positive numbers.  The “outer” CASE expression forces a return of Null if any member of the data set is less than or equal to zero.  The “inner” CASE expression avoids the error that would arise from passing an invalid value to the LOG function.

If the ValueColumn in your table contains a rate of growth or decay (for example, annual investment returns of 4%, 8%, -15%, and 6%), then we need to adjust these rates by adding one to each item, thus transforming the values into the factors that would be used in determining the end result of the growth or decay.  The generic SQL statement then becomes:

SELECT {group by columns if desired,} CASE
                          WHEN MIN([ValueColumn]) > -1 THEN EXP(AVG(LOG(1 +
                              (CASE WHEN [ValueColumn] > -1 THEN [ValueColumn] ELSE 1 END)))) - 1
                          ELSE NULL END AS GeomMean
                      FROM {tables}
                      {GROUP BY {group by columns}}

Open in new window


Note: This generic SQL statement uses CASE expressions similarly, except that instead of testing for greater than zero, we instead test for greater than negative one: since we are already adding one to each item, any table value greater than negative one will become positive after this adjustment.

Example 5: Geometric Mean

A common problem in finance is to determine the compound annual growth rate, or CAGR, for an investment over some number of periods, for which the rates of return for each period are varying but known.  In such cases, the CAGR will be the geometric mean.

Consider the per-period returns for the following two investments:

tblGeomMean
To find the CAGRs for these two investments, use the following SQL statement:

SELECT Investment, CASE
                          WHEN MIN([AnnualReturn]) > -1 THEN EXP(AVG(LOG(1 + (CASE 
                              WHEN [AnnualReturn] > -1 THEN [AnnualReturn] ELSE 1 END)))) - 1
                          ELSE NULL END AS CAGR
                      FROM tblGeomMean
                      GROUP BY Investment;

Open in new window


That SQL statement yields the following results:

Example 5 Results
To validate those results, consider the period-by-period results for Investment A:

Geometric Mean Validation
Both approaches yield the same result: -0.0040, or -0.40%.


Weighted Geometric Mean


As seen with the harmonic mean in Part 1 of this article, there is also a weighted form for the geometric mean.  As the name suggests, use the weighted geometric mean in cases for which the various factors will have unequal weights.  For a data set with values {x1, x2, x3, …, xn} and weights {w1, w2, w3, …, wn}, the weighted geometric mean is:

Weighted Geometric Mean
As above for the simple geometric mean, by using the rules of operations with logarithms we are able to simply the formula and place it into terms SQL Server can readily process by using natural logarithms.

As in the simple geometric mean, the values must all be greater than zero.  Also, as with the weighted average and weighted harmonic mean, the weights cannot be negative, and there must be at least one non-zero weight.  Any SQL statement used for calculating the weighted harmonic mean must test for those conditions.

Translating this into a generic SQL statement yields:

SELECT {group by columns if desired,} CASE
                          WHEN MIN([WeightColumn]) >= 0 AND MAX([WeightColumn]) > 0
                              AND MIN([ValueColumn]) > 0 THEN
                              EXP(SUM([WeightColumn] * LOG(CASE WHEN [ValueColumn] > 0 THEN [ValueColumn] ELSE 1 END))) /
                              SUM([WeightColumn]))
                          ELSE NULL END AS WtdGeomMean
                      FROM {tables}
                      {GROUP BY {group by columns}}

Open in new window


Note: This generic SQL statement uses CASE expressions to handle the constraints that all of the values in the data set must either be null or be positive numbers, that weights cannot be negative, and that there must be at least one non-zero weight.  The “outer” CASE expression forces a return of Null if any value in the data set is less than or equal to zero, or if the weights are invalid.  (Any items with a zero weight are ignored.)  The “inner” CASE expression avoids the error that would arise from passing an invalid value to the LOG function.

As with the simple geometric mean, if the ValueColumn in your table contains a rate of growth or decay (for example, annual investment returns of 4%, 8%, -15%, and 6%), then we need to adjust these rates by adding one to each item, thus transforming the values into the factors that would be used in determining the end result of the growth or decay.  The generic SQL statement then becomes:

SELECT {group by columns if desired,} CASE
                          WHEN MIN([WeightColumn]) >= 0 AND MAX([WeightColumn]) > 0
                              AND MIN([ValueColumn]) > -1 THEN
                              EXP(SUM([WeightColumn] * LOG(1 + CASE WHEN [ValueColumn] > -1 THEN [ValueColumn] ELSE 1 END))) /
                              SUM([WeightColumn])) - 1
                          ELSE NULL END AS WtdGeomMean
                      FROM {tables}
                      {GROUP BY {group by columns}}

Open in new window


Note: This generic SQL statement uses CASE expressions similarly, except that instead of testing for greater than zero, we instead test for greater than negative one: since we are already adding one to each item, any table value greater than negative one will become positive after this adjustment.

Example 6: Weighted Geometric Mean

This example is similar to Example 5 above, except that this time the rates of return are not stated for each individual period.  Instead, our table shows the annual rates of return for periods of a varying number of years.

tblWtdGeomMean
Thus, to calculate the CAGR for each investment, we must use a weighted geometric mean, for which the weights are the length in years for each segment.  To obtain the CAGR for each investment, use the following SQL statement:

SELECT Investment, CASE 
                          WHEN MIN([Years]) >= 0 And MAX([Years])>0 And MIN([AnnualReturn]) > -1 THEN
                              EXP(SUM([Years] * LOG(1 + CASE
                                  WHEN [AnnualReturn] > -1 THEN [AnnualReturn]
                                  ELSE 1 END)) / SUM([Years])) - 1
                          ELSE Null END AS CAGR
                      FROM tblWeightedGeomMean
                      GROUP BY Investment;

Open in new window


This yields the following results:

Example 6 Results
To validate those results, consider the period-by-period results for Investment A:

Weighted Geometric Mean Validation
Both approaches yield the same result: 0.0309, or 3.09%.


Quadratic Mean



Also sometimes called the root mean square, the quadratic mean is the square root of the average of the squared values in the data set.  The general formula for the quadratic mean is:

Quadratic Mean
Like the arithmetic mean and weighted average, but unlike the harmonic mean and geometric mean, the values in the data set can be any real number.

Translating this into a generic SQL statement yields:

SELECT {group by columns if desired,} POWER(AVG(POWER([ValueColumn], 2)), 0.5) AS QuadMean
                      FROM {tables}
                      {GROUP BY {group by columns}}

Open in new window


Example 7: Quadratic Mean

The quadratic mean is often used to measure the variability of the pair-wise differences in data points for two data sets.  For example, suppose we have two series of experimental data; for each series, our table records the theoretical values for each series, as well as the observed values recorded during the experiment.

tblQuadMean
To understand the variability of the differences between the theoretical and observed values, we can use the quadratic mean of the differences in the pair-wise values.  To calculate the quadratic mean, use the following SQL statement:

SELECT Series, POWER(AVG(POWER([Theory] - [Observed], 2)), 0.5) AS QuadMean
                      FROM tblQuadMean
                      GROUP BY Series;

Open in new window


This yields the following results:

Example 7 Results

Generalized Power Mean



All of the means discussed in this article, as well as the familiar arithmetic mean, minimum, and maximum, can all be derived from a generalized power mean.  To calculate the generalized mean of a data set for any power p, use the following formula:

Power Mean
For the generalized mean, the following special cases may be of interest:

p = 1: arithmetic mean
p = -1: harmonic mean
p = 0: the generalized mean is undefined for p = 0, but the geometric mean is the limit of the generalized mean as p approaches zero
Minimum: the limit of the generalized mean approaches negative infinity is the data set minimum
Maximum: the limit of the generalized mean approaches positive infinity is the data set maximum

Translating this into a generic SQL statement yields (substitute your desired exponent for p):

SELECT {group by columns if desired,} POWER(AVG(POWER([ValueColumn], p)), (1 / p)) AS GeneralMean
                      FROM {tables}
                      {GROUP BY {group by columns}}

Open in new window


Note: Since division by zero is undefined, this general form cannot be used to determine the geometric mean.  This generalized form is also inappropriate for the harmonic mean unless you are confident that all of the values are positive numbers.

As with the other means, the generalized mean also has a weighted form:

Weighted Power Mean
Translating this into a generic SQL statement yields (substitute your desired exponent for p):

SELECT {group by columns if desired,} POWER((CASE
                          WHEN MIN([WeightColumn]) >= 0 AND MAX([WeightColumn]) > 0 THEN
                              Sum([WeightColumn] * POWER([ValueColumn], p))
                          ELSE NULL END) / SUM([WeightColumn]), (1 / p)) AS WtdGeneralMean
                      FROM {tables}
                      {GROUP BY {group by columns}}

Open in new window


Note: The same caveats as for the un-weighted generalized mean apply here.  In addition, as with the weighted average and weighted harmonic mean, the weights cannot be negative, and you must have at least one positive weight.  There is no need to test for the sum of weights being zero: if the sum of weights is zero, the numerator will already be Null, and so SQL Server will not attempt to perform the division.

I am not providing examples for the generalized mean, as it is highly unlikely that you will ever need to handle any mean it can generate other than the arithmetic, harmonic, geometric, or quadratic mean, weighted or un-weighted, and this article already provides examples optimized for those sorts of means.


=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
If you liked this article and want to see more from this author,  please click here.

If you found this article helpful, please click the Yes button near the:

      Was this article helpful?

label that is just below and to the right of this text.   Thanks!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
4
11,919 Views
Patrick Matthews
CERTIFIED EXPERT

Comments (0)

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.