Solved

Geometric Mean and Harmonic Mean in SAS

Posted on 2007-04-11
4
6,701 Views
Last Modified: 2013-11-16
How does one compute Geometric Mean of a column in a SAS dataset? Also Harmonic mean?
It seems possible to compute GM of a row in SAS, but that would require using PROC TRANSPOSE. I couldn't find how to compute this directly for one or more columns. It seems  surprising that this fairly mundane descriptive statistic.

thanks in advance.

- stochastic
0
Comment
Question by:stochastic
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
4 Comments
 
LVL 2

Expert Comment

by:andymcooper
ID: 19030192
Stochastic,

You're right - proc means will not do this. Why not???
Anyway, I think will the following SAS code will.
Note - comment out all lines using PRODUCT if you have a lot of variables >1, otherwise the numbers get too big!

/* dataset to try code on */
data tst;
  length VAR 8.;
  do VAR=1 to 10;
    output;
  end;
run;

/* datastep to calculate all sorts of means */
data tst_m (keep=ARITHMETIC_MEAN GEOMETRIC_MEAN_A GEOMETRIC_MEAN_B HARMONIC_MEAN);
  set tst end=EOF;
  retain COUNT PRODUCT SUM SUM_LOGS SUM_INV;
 
  length COUNT PRODUCT SUM SUM_LOGS SUM_INV ARITHMETIC_MEAN GEOMETRIC_MEAN_A GEOMETRIC_MEAN_B HARMONIC_MEAN 8.;
 
  LOG_VAR = log(VAR);

  /* initialise to 0 */
  if _N_=1 then do;
    COUNT= 0 ;
    PRODUCT=1;
    SUM_LOGS = 0;
    SUM =0 ;
    SUM_INV = 0;
  end;

  /* note - product gets too big in most cases, so sum logs instead */
  COUNT=COUNT+1;
  PRODUCT=PRODUCT*VAR;
  SUM_LOGS=SUM_LOGS+LOG_VAR;
  SUM=SUM+VAR;
  SUM_INV=SUM_INV+1/VAR;

  if EOF then do;
    ARITHMETIC_MEAN = SUM/COUNT;
    GEOMETRIC_MEAN_A = PRODUCT**(1/COUNT);
    GEOMETRIC_MEAN_B = exp((1/COUNT)*SUM_LOGS);
    HARMONIC_MEAN = 1/(SUM_INV/COUNT);
    output;
  end;
run;
0
 
LVL 2

Accepted Solution

by:
andymcooper earned 500 total points
ID: 19030356
Stochastic,

Just thought that you may need to calculate means for each category in a dataset. The following code works like a proc means with a by statement. It works the same way as the previous code, but stores the results into temporary arrays as it goes along.

%let NCATS = 4;

data tst (keep=VAR CAT);
  length VAR 8.;
  do i=1 to 10;
    CAT = 1;
    VAR = i;
    output;
  end;
  do CAT=2 to &NCATS;
    do i=1 to 10;
      VAR = ranuni(1);
      output;
    end;
  end;
run;

/* calculate different sorts of means, and split the results by CAT */
/* note this only works if the categories are numeric variables */
/* if not, convert them to one using an informat */

proc sort data=tst; by CAT; run; quit;

data tst_m (keep=CAT ARITHMETIC_MEAN GEOMETRIC_MEAN_A GEOMETRIC_MEAN_B HARMONIC_MEAN);
  set tst end=EOF;
  by CAT;

  /* setup arrays */
  array COUNT{&NCATS} _temporary_;
  array PRODUCT{&NCATS} _temporary_;
  array SUMS{&NCATS} _temporary_;  
  array SUM_LOGS{&NCATS} _temporary_;  
  array SUM_INV{&NCATS} _temporary_;  

  length ARITHMETIC_MEAN GEOMETRIC_MEAN_A GEOMETRIC_MEAN_B HARMONIC_MEAN 8.;
 
  LOG_VAR = log(VAR);

  /* initialise to 0 */
  if first.CAT then do;
    COUNT{CAT}= 0 ;
    PRODUCT{CAT}=1;
    SUM_LOGS{CAT} = 0;
    SUMS{CAT} =0 ;
    SUM_INV{CAT} = 0;
  end;

  /* note - product gets too big in most cases, so sum logs instead */
  /* store all sums and counts in temporary arrays, at the position for this category */
  COUNT{CAT}=COUNT{CAT}+1;
  PRODUCT{CAT}=PRODUCT{CAT}*VAR;
  SUM_LOGS{CAT}=SUM_LOGS{CAT}+LOG_VAR;
  SUMS{CAT}=SUMS{CAT}+VAR;
  SUM_INV{CAT}=SUM_INV{CAT}+1/VAR;

  if EOF then do;
    /* output arrays into a long dataset */
    do CAT=1 to &NCATS;
      ARITHMETIC_MEAN = SUMS{CAT}/COUNT{CAT};
      GEOMETRIC_MEAN_A = PRODUCT{CAT}**(1/COUNT{CAT});
      GEOMETRIC_MEAN_B = exp((1/COUNT{CAT})*SUM_LOGS{CAT});
      HARMONIC_MEAN = 1/(SUM_INV{CAT}/COUNT{CAT});
      output;
    end;
  end;
run;

Andy
0
 
LVL 8

Author Comment

by:stochastic
ID: 19222808
andymcooper,

I apologize for the delayed response. Thanks for your suggestions, I will try them out. Meanwhile
I'm accepting your comment as a solution. What remains is the curiosity about why this isn't a built-in feature.

- stochastic
0

Featured Post

Webinar: MongoDB® Index Types

Join Percona’s Senior Technical Services Engineer, Adamo Tonete as he presents “MongoDB Index Types, How, When and Where Should They be Used?” on Wednesday, July 12, 2017 at 11:00 am PDT / 2:00 pm EDT (UTC-7).

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Never store passwords in plain text or just their hash: it seems a no-brainier, but there are still plenty of people doing that. I present the why and how on this subject, offering my own real life solution that you can implement right away, bringin…
Recently, Microsoft released a best-practice guide for securing Active Directory. It's a whopping 300+ pages long. Those of us tasked with securing our company’s databases and systems would, ideally, have time to devote to learning the ins and outs…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…
This is a high-level webinar that covers the history of enterprise open source database use. It addresses both the advantages companies see in using open source database technologies, as well as the fears and reservations they might have. In this…

719 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question