# Geometric Mean and Harmonic Mean in SAS

How does one compute Geometric Mean of a column in a SAS dataset? Also Harmonic mean?
It seems possible to compute GM of a row in SAS, but that would require using PROC TRANSPOSE. I couldn't find how to compute this directly for one or more columns. It seems  surprising that this fairly mundane descriptive statistic.

- stochastic
LVL 8
###### Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Commented:
Stochastic,

You're right - proc means will not do this. Why not???
Anyway, I think will the following SAS code will.
Note - comment out all lines using PRODUCT if you have a lot of variables >1, otherwise the numbers get too big!

/* dataset to try code on */
data tst;
length VAR 8.;
do VAR=1 to 10;
output;
end;
run;

/* datastep to calculate all sorts of means */
data tst_m (keep=ARITHMETIC_MEAN GEOMETRIC_MEAN_A GEOMETRIC_MEAN_B HARMONIC_MEAN);
set tst end=EOF;
retain COUNT PRODUCT SUM SUM_LOGS SUM_INV;

length COUNT PRODUCT SUM SUM_LOGS SUM_INV ARITHMETIC_MEAN GEOMETRIC_MEAN_A GEOMETRIC_MEAN_B HARMONIC_MEAN 8.;

LOG_VAR = log(VAR);

/* initialise to 0 */
if _N_=1 then do;
COUNT= 0 ;
PRODUCT=1;
SUM_LOGS = 0;
SUM =0 ;
SUM_INV = 0;
end;

/* note - product gets too big in most cases, so sum logs instead */
COUNT=COUNT+1;
PRODUCT=PRODUCT*VAR;
SUM_LOGS=SUM_LOGS+LOG_VAR;
SUM=SUM+VAR;
SUM_INV=SUM_INV+1/VAR;

if EOF then do;
ARITHMETIC_MEAN = SUM/COUNT;
GEOMETRIC_MEAN_A = PRODUCT**(1/COUNT);
GEOMETRIC_MEAN_B = exp((1/COUNT)*SUM_LOGS);
HARMONIC_MEAN = 1/(SUM_INV/COUNT);
output;
end;
run;
0
Commented:
Stochastic,

Just thought that you may need to calculate means for each category in a dataset. The following code works like a proc means with a by statement. It works the same way as the previous code, but stores the results into temporary arrays as it goes along.

%let NCATS = 4;

data tst (keep=VAR CAT);
length VAR 8.;
do i=1 to 10;
CAT = 1;
VAR = i;
output;
end;
do CAT=2 to &NCATS;
do i=1 to 10;
VAR = ranuni(1);
output;
end;
end;
run;

/* calculate different sorts of means, and split the results by CAT */
/* note this only works if the categories are numeric variables */
/* if not, convert them to one using an informat */

proc sort data=tst; by CAT; run; quit;

data tst_m (keep=CAT ARITHMETIC_MEAN GEOMETRIC_MEAN_A GEOMETRIC_MEAN_B HARMONIC_MEAN);
set tst end=EOF;
by CAT;

/* setup arrays */
array COUNT{&NCATS} _temporary_;
array PRODUCT{&NCATS} _temporary_;
array SUMS{&NCATS} _temporary_;
array SUM_LOGS{&NCATS} _temporary_;
array SUM_INV{&NCATS} _temporary_;

length ARITHMETIC_MEAN GEOMETRIC_MEAN_A GEOMETRIC_MEAN_B HARMONIC_MEAN 8.;

LOG_VAR = log(VAR);

/* initialise to 0 */
if first.CAT then do;
COUNT{CAT}= 0 ;
PRODUCT{CAT}=1;
SUM_LOGS{CAT} = 0;
SUMS{CAT} =0 ;
SUM_INV{CAT} = 0;
end;

/* note - product gets too big in most cases, so sum logs instead */
/* store all sums and counts in temporary arrays, at the position for this category */
COUNT{CAT}=COUNT{CAT}+1;
PRODUCT{CAT}=PRODUCT{CAT}*VAR;
SUM_LOGS{CAT}=SUM_LOGS{CAT}+LOG_VAR;
SUMS{CAT}=SUMS{CAT}+VAR;
SUM_INV{CAT}=SUM_INV{CAT}+1/VAR;

if EOF then do;
/* output arrays into a long dataset */
do CAT=1 to &NCATS;
ARITHMETIC_MEAN = SUMS{CAT}/COUNT{CAT};
GEOMETRIC_MEAN_A = PRODUCT{CAT}**(1/COUNT{CAT});
GEOMETRIC_MEAN_B = exp((1/COUNT{CAT})*SUM_LOGS{CAT});
HARMONIC_MEAN = 1/(SUM_INV{CAT}/COUNT{CAT});
output;
end;
end;
run;

Andy
0

Experts Exchange Solution brought to you by

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Author Commented:
andymcooper,

I apologize for the delayed response. Thanks for your suggestions, I will try them out. Meanwhile
I'm accepting your comment as a solution. What remains is the curiosity about why this isn't a built-in feature.

- stochastic
0
###### It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Databases

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.