How does one compute Geometric Mean of a column in a SAS dataset? Also Harmonic mean?

It seems possible to compute GM of a row in SAS, but that would require using PROC TRANSPOSE. I couldn't find how to compute this directly for one or more columns. It seems surprising that this fairly mundane descriptive statistic.

thanks in advance.

- stochastic

It seems possible to compute GM of a row in SAS, but that would require using PROC TRANSPOSE. I couldn't find how to compute this directly for one or more columns. It seems surprising that this fairly mundane descriptive statistic.

thanks in advance.

- stochastic

Experts Exchange Solution brought to you by

Enjoy your complimentary solution view.

Get every solution instantly with premium.
Start your 7-day free trial.

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Just thought that you may need to calculate means for each category in a dataset. The following code works like a proc means with a by statement. It works the same way as the previous code, but stores the results into temporary arrays as it goes along.

%let NCATS = 4;

data tst (keep=VAR CAT);

length VAR 8.;

do i=1 to 10;

CAT = 1;

VAR = i;

output;

end;

do CAT=2 to &NCATS;

do i=1 to 10;

VAR = ranuni(1);

output;

end;

end;

run;

/* calculate different sorts of means, and split the results by CAT */

/* note this only works if the categories are numeric variables */

/* if not, convert them to one using an informat */

proc sort data=tst; by CAT; run; quit;

data tst_m (keep=CAT ARITHMETIC_MEAN GEOMETRIC_MEAN_A GEOMETRIC_MEAN_B HARMONIC_MEAN);

set tst end=EOF;

by CAT;

/* setup arrays */

array COUNT{&NCATS} _temporary_;

array PRODUCT{&NCATS} _temporary_;

array SUMS{&NCATS} _temporary_;

array SUM_LOGS{&NCATS} _temporary_;

array SUM_INV{&NCATS} _temporary_;

length ARITHMETIC_MEAN GEOMETRIC_MEAN_A GEOMETRIC_MEAN_B HARMONIC_MEAN 8.;

LOG_VAR = log(VAR);

/* initialise to 0 */

if first.CAT then do;

COUNT{CAT}= 0 ;

PRODUCT{CAT}=1;

SUM_LOGS{CAT} = 0;

SUMS{CAT} =0 ;

SUM_INV{CAT} = 0;

end;

/* note - product gets too big in most cases, so sum logs instead */

/* store all sums and counts in temporary arrays, at the position for this category */

COUNT{CAT}=COUNT{CAT}+1;

PRODUCT{CAT}=PRODUCT{CAT}*

SUM_LOGS{CAT}=SUM_LOGS{CAT

SUMS{CAT}=SUMS{CAT}+VAR;

SUM_INV{CAT}=SUM_INV{CAT}+

if EOF then do;

/* output arrays into a long dataset */

do CAT=1 to &NCATS;

ARITHMETIC_MEAN = SUMS{CAT}/COUNT{CAT};

GEOMETRIC_MEAN_A = PRODUCT{CAT}**(1/COUNT{CAT

GEOMETRIC_MEAN_B = exp((1/COUNT{CAT})*SUM_LOG

HARMONIC_MEAN = 1/(SUM_INV{CAT}/COUNT{CAT}

output;

end;

end;

run;

Andy

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trialI apologize for the delayed response. Thanks for your suggestions, I will try them out. Meanwhile

I'm accepting your comment as a solution. What remains is the curiosity about why this isn't a built-in feature.

- stochastic

Databases

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Experts Exchange Solution brought to you by

Enjoy your complimentary solution view.

Get every solution instantly with premium.
Start your 7-day free trial.

You're right - proc means will not do this. Why not???

Anyway, I think will the following SAS code will.

Note - comment out all lines using PRODUCT if you have a lot of variables >1, otherwise the numbers get too big!

/* dataset to try code on */

data tst;

length VAR 8.;

do VAR=1 to 10;

output;

end;

run;

/* datastep to calculate all sorts of means */

data tst_m (keep=ARITHMETIC_MEAN GEOMETRIC_MEAN_A GEOMETRIC_MEAN_B HARMONIC_MEAN);

set tst end=EOF;

retain COUNT PRODUCT SUM SUM_LOGS SUM_INV;

length COUNT PRODUCT SUM SUM_LOGS SUM_INV ARITHMETIC_MEAN GEOMETRIC_MEAN_A GEOMETRIC_MEAN_B HARMONIC_MEAN 8.;

LOG_VAR = log(VAR);

/* initialise to 0 */

if _N_=1 then do;

COUNT= 0 ;

PRODUCT=1;

SUM_LOGS = 0;

SUM =0 ;

SUM_INV = 0;

end;

/* note - product gets too big in most cases, so sum logs instead */

COUNT=COUNT+1;

PRODUCT=PRODUCT*VAR;

SUM_LOGS=SUM_LOGS+LOG_VAR;

SUM=SUM+VAR;

SUM_INV=SUM_INV+1/VAR;

if EOF then do;

ARITHMETIC_MEAN = SUM/COUNT;

GEOMETRIC_MEAN_A = PRODUCT**(1/COUNT);

GEOMETRIC_MEAN_B = exp((1/COUNT)*SUM_LOGS);

HARMONIC_MEAN = 1/(SUM_INV/COUNT);

output;

end;

run;