?
Solved

Geometric Mean and Harmonic Mean in SAS

Posted on 2007-04-11
4
Medium Priority
?
6,868 Views
Last Modified: 2013-11-16
How does one compute Geometric Mean of a column in a SAS dataset? Also Harmonic mean?
It seems possible to compute GM of a row in SAS, but that would require using PROC TRANSPOSE. I couldn't find how to compute this directly for one or more columns. It seems  surprising that this fairly mundane descriptive statistic.

thanks in advance.

- stochastic
0
Comment
Question by:stochastic
  • 2
3 Comments
 
LVL 2

Expert Comment

by:andymcooper
ID: 19030192
Stochastic,

You're right - proc means will not do this. Why not???
Anyway, I think will the following SAS code will.
Note - comment out all lines using PRODUCT if you have a lot of variables >1, otherwise the numbers get too big!

/* dataset to try code on */
data tst;
  length VAR 8.;
  do VAR=1 to 10;
    output;
  end;
run;

/* datastep to calculate all sorts of means */
data tst_m (keep=ARITHMETIC_MEAN GEOMETRIC_MEAN_A GEOMETRIC_MEAN_B HARMONIC_MEAN);
  set tst end=EOF;
  retain COUNT PRODUCT SUM SUM_LOGS SUM_INV;
 
  length COUNT PRODUCT SUM SUM_LOGS SUM_INV ARITHMETIC_MEAN GEOMETRIC_MEAN_A GEOMETRIC_MEAN_B HARMONIC_MEAN 8.;
 
  LOG_VAR = log(VAR);

  /* initialise to 0 */
  if _N_=1 then do;
    COUNT= 0 ;
    PRODUCT=1;
    SUM_LOGS = 0;
    SUM =0 ;
    SUM_INV = 0;
  end;

  /* note - product gets too big in most cases, so sum logs instead */
  COUNT=COUNT+1;
  PRODUCT=PRODUCT*VAR;
  SUM_LOGS=SUM_LOGS+LOG_VAR;
  SUM=SUM+VAR;
  SUM_INV=SUM_INV+1/VAR;

  if EOF then do;
    ARITHMETIC_MEAN = SUM/COUNT;
    GEOMETRIC_MEAN_A = PRODUCT**(1/COUNT);
    GEOMETRIC_MEAN_B = exp((1/COUNT)*SUM_LOGS);
    HARMONIC_MEAN = 1/(SUM_INV/COUNT);
    output;
  end;
run;
0
 
LVL 2

Accepted Solution

by:
andymcooper earned 2000 total points
ID: 19030356
Stochastic,

Just thought that you may need to calculate means for each category in a dataset. The following code works like a proc means with a by statement. It works the same way as the previous code, but stores the results into temporary arrays as it goes along.

%let NCATS = 4;

data tst (keep=VAR CAT);
  length VAR 8.;
  do i=1 to 10;
    CAT = 1;
    VAR = i;
    output;
  end;
  do CAT=2 to &NCATS;
    do i=1 to 10;
      VAR = ranuni(1);
      output;
    end;
  end;
run;

/* calculate different sorts of means, and split the results by CAT */
/* note this only works if the categories are numeric variables */
/* if not, convert them to one using an informat */

proc sort data=tst; by CAT; run; quit;

data tst_m (keep=CAT ARITHMETIC_MEAN GEOMETRIC_MEAN_A GEOMETRIC_MEAN_B HARMONIC_MEAN);
  set tst end=EOF;
  by CAT;

  /* setup arrays */
  array COUNT{&NCATS} _temporary_;
  array PRODUCT{&NCATS} _temporary_;
  array SUMS{&NCATS} _temporary_;  
  array SUM_LOGS{&NCATS} _temporary_;  
  array SUM_INV{&NCATS} _temporary_;  

  length ARITHMETIC_MEAN GEOMETRIC_MEAN_A GEOMETRIC_MEAN_B HARMONIC_MEAN 8.;
 
  LOG_VAR = log(VAR);

  /* initialise to 0 */
  if first.CAT then do;
    COUNT{CAT}= 0 ;
    PRODUCT{CAT}=1;
    SUM_LOGS{CAT} = 0;
    SUMS{CAT} =0 ;
    SUM_INV{CAT} = 0;
  end;

  /* note - product gets too big in most cases, so sum logs instead */
  /* store all sums and counts in temporary arrays, at the position for this category */
  COUNT{CAT}=COUNT{CAT}+1;
  PRODUCT{CAT}=PRODUCT{CAT}*VAR;
  SUM_LOGS{CAT}=SUM_LOGS{CAT}+LOG_VAR;
  SUMS{CAT}=SUMS{CAT}+VAR;
  SUM_INV{CAT}=SUM_INV{CAT}+1/VAR;

  if EOF then do;
    /* output arrays into a long dataset */
    do CAT=1 to &NCATS;
      ARITHMETIC_MEAN = SUMS{CAT}/COUNT{CAT};
      GEOMETRIC_MEAN_A = PRODUCT{CAT}**(1/COUNT{CAT});
      GEOMETRIC_MEAN_B = exp((1/COUNT{CAT})*SUM_LOGS{CAT});
      HARMONIC_MEAN = 1/(SUM_INV{CAT}/COUNT{CAT});
      output;
    end;
  end;
run;

Andy
0
 
LVL 8

Author Comment

by:stochastic
ID: 19222808
andymcooper,

I apologize for the delayed response. Thanks for your suggestions, I will try them out. Meanwhile
I'm accepting your comment as a solution. What remains is the curiosity about why this isn't a built-in feature.

- stochastic
0

Featured Post

NFR key for Veeam Agent for Linux

Veeam is happy to provide a free NFR license for one year.  It allows for the non‑production use and valid for five workstations and two servers. Veeam Agent for Linux is a simple backup tool for your Linux installations, both on‑premises and in the public cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

One of the most important things in an application is the query performance. This article intends to give you good tips to improve the performance of your queries.
How much do you know about the future of data centers? If you're like 50% of organizations, then it's probably not enough. Read on to get up to speed on this emerging field.
This is a high-level webinar that covers the history of enterprise open source database use. It addresses both the advantages companies see in using open source database technologies, as well as the fears and reservations they might have. In this…
Despite its rising prevalence in the business world, "the cloud" is still misunderstood. Some companies still believe common misconceptions about lack of security in cloud solutions and many misuses of cloud storage options still occur every day. …
Suggested Courses

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question