?
Solved

Geometric Mean and Harmonic Mean in SAS

Posted on 2007-04-11
4
Medium Priority
?
6,767 Views
Last Modified: 2013-11-16
How does one compute Geometric Mean of a column in a SAS dataset? Also Harmonic mean?
It seems possible to compute GM of a row in SAS, but that would require using PROC TRANSPOSE. I couldn't find how to compute this directly for one or more columns. It seems  surprising that this fairly mundane descriptive statistic.

thanks in advance.

- stochastic
0
Comment
Question by:stochastic
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
4 Comments
 
LVL 2

Expert Comment

by:andymcooper
ID: 19030192
Stochastic,

You're right - proc means will not do this. Why not???
Anyway, I think will the following SAS code will.
Note - comment out all lines using PRODUCT if you have a lot of variables >1, otherwise the numbers get too big!

/* dataset to try code on */
data tst;
  length VAR 8.;
  do VAR=1 to 10;
    output;
  end;
run;

/* datastep to calculate all sorts of means */
data tst_m (keep=ARITHMETIC_MEAN GEOMETRIC_MEAN_A GEOMETRIC_MEAN_B HARMONIC_MEAN);
  set tst end=EOF;
  retain COUNT PRODUCT SUM SUM_LOGS SUM_INV;
 
  length COUNT PRODUCT SUM SUM_LOGS SUM_INV ARITHMETIC_MEAN GEOMETRIC_MEAN_A GEOMETRIC_MEAN_B HARMONIC_MEAN 8.;
 
  LOG_VAR = log(VAR);

  /* initialise to 0 */
  if _N_=1 then do;
    COUNT= 0 ;
    PRODUCT=1;
    SUM_LOGS = 0;
    SUM =0 ;
    SUM_INV = 0;
  end;

  /* note - product gets too big in most cases, so sum logs instead */
  COUNT=COUNT+1;
  PRODUCT=PRODUCT*VAR;
  SUM_LOGS=SUM_LOGS+LOG_VAR;
  SUM=SUM+VAR;
  SUM_INV=SUM_INV+1/VAR;

  if EOF then do;
    ARITHMETIC_MEAN = SUM/COUNT;
    GEOMETRIC_MEAN_A = PRODUCT**(1/COUNT);
    GEOMETRIC_MEAN_B = exp((1/COUNT)*SUM_LOGS);
    HARMONIC_MEAN = 1/(SUM_INV/COUNT);
    output;
  end;
run;
0
 
LVL 2

Accepted Solution

by:
andymcooper earned 2000 total points
ID: 19030356
Stochastic,

Just thought that you may need to calculate means for each category in a dataset. The following code works like a proc means with a by statement. It works the same way as the previous code, but stores the results into temporary arrays as it goes along.

%let NCATS = 4;

data tst (keep=VAR CAT);
  length VAR 8.;
  do i=1 to 10;
    CAT = 1;
    VAR = i;
    output;
  end;
  do CAT=2 to &NCATS;
    do i=1 to 10;
      VAR = ranuni(1);
      output;
    end;
  end;
run;

/* calculate different sorts of means, and split the results by CAT */
/* note this only works if the categories are numeric variables */
/* if not, convert them to one using an informat */

proc sort data=tst; by CAT; run; quit;

data tst_m (keep=CAT ARITHMETIC_MEAN GEOMETRIC_MEAN_A GEOMETRIC_MEAN_B HARMONIC_MEAN);
  set tst end=EOF;
  by CAT;

  /* setup arrays */
  array COUNT{&NCATS} _temporary_;
  array PRODUCT{&NCATS} _temporary_;
  array SUMS{&NCATS} _temporary_;  
  array SUM_LOGS{&NCATS} _temporary_;  
  array SUM_INV{&NCATS} _temporary_;  

  length ARITHMETIC_MEAN GEOMETRIC_MEAN_A GEOMETRIC_MEAN_B HARMONIC_MEAN 8.;
 
  LOG_VAR = log(VAR);

  /* initialise to 0 */
  if first.CAT then do;
    COUNT{CAT}= 0 ;
    PRODUCT{CAT}=1;
    SUM_LOGS{CAT} = 0;
    SUMS{CAT} =0 ;
    SUM_INV{CAT} = 0;
  end;

  /* note - product gets too big in most cases, so sum logs instead */
  /* store all sums and counts in temporary arrays, at the position for this category */
  COUNT{CAT}=COUNT{CAT}+1;
  PRODUCT{CAT}=PRODUCT{CAT}*VAR;
  SUM_LOGS{CAT}=SUM_LOGS{CAT}+LOG_VAR;
  SUMS{CAT}=SUMS{CAT}+VAR;
  SUM_INV{CAT}=SUM_INV{CAT}+1/VAR;

  if EOF then do;
    /* output arrays into a long dataset */
    do CAT=1 to &NCATS;
      ARITHMETIC_MEAN = SUMS{CAT}/COUNT{CAT};
      GEOMETRIC_MEAN_A = PRODUCT{CAT}**(1/COUNT{CAT});
      GEOMETRIC_MEAN_B = exp((1/COUNT{CAT})*SUM_LOGS{CAT});
      HARMONIC_MEAN = 1/(SUM_INV{CAT}/COUNT{CAT});
      output;
    end;
  end;
run;

Andy
0
 
LVL 8

Author Comment

by:stochastic
ID: 19222808
andymcooper,

I apologize for the delayed response. Thanks for your suggestions, I will try them out. Meanwhile
I'm accepting your comment as a solution. What remains is the curiosity about why this isn't a built-in feature.

- stochastic
0

Featured Post

Get proactive database performance tuning online

At Percona’s web store you can order full Percona Database Performance Audit in minutes. Find out the health of your database, and how to improve it. Pay online with a credit card. Improve your database performance now!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A Stored Procedure in Microsoft SQL Server is a powerful feature that it can be used to execute the Data Manipulation Language (DML) or Data Definition Language (DDL). Depending on business requirements, a single Stored Procedure can return differe…
Recently I was talking with Tim Sharp, one of my colleagues from our Technical Account Manager team about MongoDB’s scalability. While doing some quick training with some of the Percona team, Tim brought something to my attention...
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…
Suggested Courses

765 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question