• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 694
  • Last Modified:

T-test

I need the source code of Student T-test (statistical test to compare means of two samples with normal distribution) in any of the followin program languages: C++, C#, VB, PHP, J#, or Java.
Or if you can find it in other languages (like C, Fortran or even assembly!) I need it to translated to a language I understand.
I only need the source code for comapring two independent samples, for now.
Thanks
Huji
0
huji
Asked:
huji
  • 8
  • 5
  • 2
  • +1
3 Solutions
 
TimYatesCommented:
0
 
hujiAuthor Commented:
Sorry but no. That worked example uses Excel. I need the source code to code my own program, which should not be Excel-dependent.
0
 
TimYatesCommented:
Fair enough, but if you look at the worked example above the excel one, it shows you what to do at each step of the way...

I assume this is some sort of college assignment?
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
tobydavidCommented:
I found this algorithm that looks easy enough to translate to your language of choice.  
(Source: http://lib.stat.cmu.edu/apstat/ )


      REAL FUNCTION STUDNT (T, DOFF, IFAULT)
C
C     ALGORITHM AS 27  APPL. STATIST. VOL.19, NO.1
C
C     Calculate the upper tail area under Student's t-distribution
C
C     Translated from Algol by Alan Miller
C
      INTEGER IFAULT
      REAL T, DOFF
C
C     Local variables
C
      REAL V, X, TT, TWO, FOUR, ONE, ZERO, HALF
      REAL A1, A2, A3, A4, A5, B1, B2,
     *     C1, C2, C3, C4, C5, D1, D2,
     *     E1, E2, E3, E4, E5, F1, F2,
     *     G1, G2, G3, G4, G5, H1, H2,
     *     I1, I2, I3, I4, I5, J1, J2
      LOGICAL POS
      DATA TWO /2.0/, FOUR /4.0/, ONE /1.0/, ZERO /0.0/, HALF /0.5/
      DATA A1, A2, A3, A4, A5 /0.09979441, -0.581821, 1.390993,
     *     -1.222452, 2.151185/, B1, B2 /5.537409, 11.42343/
      DATA C1, C2, C3, C4, C5 /0.04431742, -0.2206018, -0.03317253,
     *     5.679969, -12.96519/, D1, D2 /5.166733, 13.49862/
      DATA E1, E2, E3, E4, E5 /0.009694901, -0.1408854, 1.88993,
     *     -12.75532, 25.77532/, F1, F2 /4.233736, 14.3963/
      DATA G1, G2, G3, G4, G5 /-9.187228E-5, 0.03789901, -1.280346,
     *     9.249528, -19.08115/, H1, H2 /2.777816, 16.46132/
      DATA I1, I2, I3, I4, I5 /5.79602E-4, -0.02763334, 0.4517029,
     *     -2.657697, 5.127212/, J1, J2 /0.5657187, 21.83269/
C
C     Check that number of degrees of freedom > 4.
C
      IF (DOFF .LT. TWO) THEN
      IFAULT = 1
      STUDNT = - ONE
      RETURN
      END IF
C
      IF (DOFF .LE. FOUR) THEN
      IFAULT = DOFF
      ELSE
      IFAULT = 0
      END IF
C
C     Evaluate series.
C
      V = ONE / DOFF
      POS = (T .GE. ZERO)
      TT = ABS(T)
      X = HALF * (ONE +
     *    TT * (((A1 + V * (A2 + V * (A3 + V * (A4 + V * A5)))) /
     *        (ONE - V * (B1 - V * B2))) +
     *    TT * (((C1 + V * (C2 + V * (C3 + V * (C4 + V * C5)))) /
     *        (ONE - V * (D1 - V * D2))) +
     *    TT * (((E1 + V * (E2 + V * (E3 + V * (E4 + V * E5)))) /
     *        (ONE - V * (F1 - V * F2))) +
     *    TT * (((G1 + V * (G2 + V * (G3 + V * (G4 + V * G5)))) /
     *        (ONE - V * (H1 - V * H2))) +
     *    TT * ((I1 + V * (I2 + V * (I3 + V * (I4 + V * I5)))) /
     *        (ONE - V * (J1 - V * J2))) ))))) ** (-8)
      IF (POS) THEN
      STUDNT = X
      ELSE
      STUDNT = ONE - X
      END IF
C
      RETURN
      END

0
 
hujiAuthor Commented:
Nope this is not. This is a research project I'm carrying on. And I need the available source codes to compare with our current version and do some fixes.
And about what appears on the top section of that page: I know enough statistics to do that myself! *L* And there is a big problem: It checks the "t" statistic against a table, which is not that accurate, and this method will only result in a true/false output (either statistically significant or non-significant) while I need to get the p-value.
0
 
hujiAuthor Commented:
@tobydavid:
Well the code is nearly what I want. I'm not sure if I really can translate it to, for example, Visual Basic, without any mistakes.
Thanks
Huji
0
 
hujiAuthor Commented:
By the way, what language is that?
0
 
tobydavidCommented:
I think it may be Fortran-77.

(Any history buffs out there?)

I am looking for other versions that may not require translation.
0
 
hujiAuthor Commented:
An open source .Net statistics component can be a very worthful answer. :o)
0
 
tobydavidCommented:
I found where someone had translated the STATLAB code into C++

http://lib.stat.cmu.edu/general/gaut.c

Hope this helps.

-- tobydavid
0
 
hujiAuthor Commented:
The more I studies that file, the more I got certain that I can't understand it! I have previously coded in C++, but I still can't see where the code for this function is:
int tprob(double t, int dof, double *probability)
There is a ONEBYPI #defined and #defined later. Looking at the code inside, it is almost what I have done to calculate the p-value when "t" statistic is given. I'm happy it has given a direct reference for the Fortran code: Applied Statistics (1968) Vol. 17, p. 189.
I need to study more on this I think.
Huji
0
 
RNMcLeanCommented:
  The code is indeed Fortran, and the idiot who coded it hasn't bothered (or didn't know how) to use arrays, thus the tedious appearance of A1, A2, ... A5 rather than an array A, elements A(1) to A(5), and likewise C and so forth which could then be indexed appropriately rather than use tediously long and complex expressions.
   By chance, I have a prog. for dealing with some different statistics (skewness and kurtosis), which works off tables via interpolation using Aitken's method. For your case, you'd modify the table to correspond to your statistic, and use Aitken's method to interpolate as needed for the specific value of your parameter. Note that my parameter involves N which might be arbitrarily large, thus I use the reciprocal, 1/N so that infinity becomes the easily accessible value, zero in my interpolations.

      FUNCTION AITKEN(N,X,Y,V)
Calculate a value Y(V), given X and Y, by Aitken's method.
Cherish hopes that V is within the range of the X's (in any order).
Caution! There is little sense in going much beyond n = 5.
       REAL X(*),Y(*),V,D(24),T(24)
        M = MIN(N,24)
        DO 1 I = 1,M
          D(I) = X(I) - V
    1     T(I) = Y(I)
        DO 2 I = 2,M
          DO 2 J = I,M
    2       T(J) = (T(I - 1)*D(J) - T(J)*D(I - 1))/(X(J) - X(I - 1))
        AITKEN = T(M)
       RETURN
      END
      SUBROUTINE STATS2(NA,A,AV,SD,SKEW,KURT,SKIP0,RAVE)
       PARAMETER (NTABLE = 28)
Compute skewness and kurtosis and decide upon a suitable rave.
       LOGICAL SKIP0
       REAL A(*)
       DOUBLE PRECISION AV,SD,SKEW,KURT
       DOUBLE PRECISION D,D2,EV2,EV3,EV4
       CHARACTER*(*) RAVE
       CHARACTER*36 R1,R2
       REAL RTABLE(NTABLE),SKEW5(NTABLE),UKURT5(NTABLE),LKURT5(NTABLE)
Critical values of 5% tails from the Biometrica Tables, V1,p183-4, 1958.
       DATA RTABLE/
     1      7,   10,   15,   20,   25,   30,   35,   40,   45,   50,
     2     70,   75,  100,  125,  150,  175,  200,  250,  300,  400,
     3    500,  700, 1000, 2000, 3000, 4000, 5000, -666/
Corresponding 1/R:   .00100  50    33    25    20     0
       DATA SKEW5/
     1  1.008, .950, .862, .777, .714, .664, .624, .587, .558, .534,
     2   .459, .445, .389, .350, .321, .298, .280, .251, .230, .200,
     3   .179, .151, .127, .090, .073, .064, .057,   0 /
       DATA UKURT5/
     1   3.55, 3.95, 4.13, 4.17, 4.16, 4.11, 4.10, 4.06, 4.00, 3.99,
     2   3.88, 3.87, 3.77, 3.71, 3.65, 3.61, 3.57, 3.52, 3.47, 3.41,
     3   3.37, 3.31, 3.26, 3.18, 3.15, 3.13, 3.12,   3 /
       DATA LKURT5/
     1   1.41, 1.56, 1.72, 1.82, 1.91, 1.98, 2.03, 2.07, 2.11, 2.15,
     2   2.25, 2.27, 2.35, 2.40, 2.45, 2.48, 2.51, 2.55, 2.59, 2.64,
     3   2.67, 2.72, 2.76, 2.83, 2.86, 2.88, 2.89,   3 /
Can't expect a polynomial to follow an asymptote, so take a reciprocal.
        IF (RTABLE(NTABLE) .NE. -666) GO TO 10
        DO 1 I = 1,NTABLE - 1
    1     RTABLE(I) = 1/RTABLE(I)
Catch n = infinity by taking its reciprocal as zero...
        RTABLE(NTABLE) = 0

Calculate the parameters, assuming AV already has the average.
   10   EV2 = 0.0
        EV3 = 0.0
        EV4 = 0.0
        N = 0
        DO 11 I = 1,NA
          V = A(I)
          IF (SKIP0 .AND. V .EQ. 0.0) GO TO 11
          N = N + 1
          D = V - AV
          D2 = D*D
          EV2 = EV2 + D2
          EV3 = EV3 + D2*D
          EV4 = EV4 + D2*D2
   11     CONTINUE
        SD = SQRT(EV2/N)
        SKEW = 0
        KURT = 0
        IF (SD .GT. 0) THEN
          SKEW = N*EV3**2 /EV2**3
          IF (EV3 .LT. 0.0) SKEW = -SKEW
          KURT = N*EV4/EV2**2
        ENDIF

Chase after the first table value below n, in reciprocals.
   20   R = 1.0/N
        DO 21 I = NTABLE,1,-1
          IF (R.LE.RTABLE(I)) GO TO 22
   21     CONTINUE
   22   I1 = MAX(1,I - 1)
        I2 = MIN(NTABLE,I + 2)
        NI = I2 - I1 + 1
Carefully extend the range of entries that straddle the unknown value.
   23   IF (NI .LT. 4) I1 = MAX(1,I1 - 1)
        NI = I2 - I1 + 1
        IF (NI .LT. 4) I2 = MIN(NTABLE,I2 + 1)
        NI = I2 - I1 + 1
        IF (NI .LT. 4) GO TO 23
Concoct some 5% decision points. NB: the table has SQRT(skew)
   30   AS  = AITKEN(NI,RTABLE(I1),SKEW5(I1),R)**2
        AKL = AITKEN(NI,RTABLE(I1),LKURT5(I1),R)
        AKU = AITKEN(NI,RTABLE(I1),UKURT5(I1),R)
Collect the words of wonder. 5% tails on each side.
        R1 = 'not significantly skewed'
        IF (SKEW.LT.-AS) R1 = 'low-value tailed'
        IF (SKEW.GT. AS) R1 = 'high-value tailed'
        R2 = 'mesokurtic'
        IF (KURT.LT. AKL) R2 = 'platykurtic'
        IF (KURT.GT. AKU) R2 = 'leptokurtic'
        L1 = LASTNB(R1)
        L2 = LASTNB(R2)
        RAVE = R1(1:L1) // ', ' // R2(1:L2)
       RETURN
      END
0
 
tobydavidCommented:
I am not a C++ programmer, but I did play around with the C++ code.  First, I have the function actually return the probability rather than having it passed as the third parameter.  For the one possible error, namely less than 1 degree of freedom, it returns 999999.  Second, I changed it from a lower tail probability to an upper tail which I think is more commonly expected.  (The area under one tail is just 1 minus the area under the other.)

I tested it for a variety of examples and they all matched the table at the back of my Stat text, so I am pretty confident it is a valid routine.  Here is my little console program that tests it.  In this example, I use a T-score of 2.797 and 25 degrees of freedom which my book shows as a probability of 0.005.  The program return 0.0049933.


#include <iostream.h>
#include <math.h>
double tprob(double t, int dof)

/* Arguments:
      t : wanna evaluate integral from t to +infinity.
      dof: degrees of freedom of this t distribution.
      probability : computed value might be returned.

      function returns error code 1 if something is wrong,
      0 if all is well.

      This functional interface is not exactly what AS 3 used to be.
*/

#define ONEBYPI ((double) 0.3183098861837906715377675)
                        /* Abramowitz & Stegun, of course. */
{
      double d_dof, s, c, f, a, b, probability;
      int fk, ks, im2, ioe, k;

      if (dof < 1) return 999999;
      d_dof = (double) dof;      /* d_dof is F of fortran code. */

      a = t / sqrt(d_dof);
      b = d_dof/(d_dof + (t*t));
      im2 = dof - 2;
      ioe = dof % 2;
      s = c = f = 1.0;
      fk = ks = 2 + ioe;
      if (im2 > 2)
            for (k=ks; k<=im2; k+=2) {
                  c = c*b*(fk - 1.0)/fk;
                  s += c;
                  if (s == f) break;      /* == ? */
                  f = s;
                  fk += 2.0;
            }
      if (ioe != 1) {      /* Label 20 of fortran code. */
            probability = 0.5 + (0.5*a*sqrt(b)*s);
            return 1-probability;
      }
      else {      /* Label 30 of fortran code. */
            if (dof == 1) s = 0.0;
            probability = 0.5 + ((a*b*s + atan(a))*ONEBYPI);
            return 1-probability;
      }
}

#undef ONEBYPI
      

int main()
{
 
  cout << tprob(2.797,24) <<
         endl;
  return 0;
}
0
 
hujiAuthor Commented:
@tobydavid
You have done so much help in this regard. I'm soooooooo thankful.
I think I understand the code. I have accessed the original article explaining the Fortran code (the reference of which I pasted above) and I'm confident it is accurate enough.
@RNMcLean
Although Aitken's method is a suitable option, in my case, the function provided there is more accurate.

@all
I feel the lack of an open source .Net statistics component in my searches. Don't you know one? Lets go and start one if that is the case!

Huji
0
 
tobydavidCommented:
You are more than welcome.  I learned (and relearned) a few things in the process of helping you.

Regards,

-- tobydavid
0
 
RNMcLeanCommented:
  No worries. I also revived some recollections.
   Incidentally, in my own use of such tests, rather than issue the boiler plate "exceeds/does not exceed the 5% (or 1% or whatever) level" or even more jargonlike "significant at the 1% level" and the like, I prefer to report something such as "...the estimated probability of the effect being due to chance variation is 3%" (or whatever it is), via interpolation with the right number of degrees of freedom around the parameter value. The reader can then make decisions as to 1% level, or 10% level, as pleases. The precise phrasing would depend on just what you were testing, but why not report all the facts ("Just the facts, ma'am"), unless of course this is a part of a larger computer programme analysing thousands of cases.
0
 
hujiAuthor Commented:
>> ...but why not report all the facts...
You are very correct. And that is just what I will be doing (and have been doing) indeed; to report the p-value and let the reader think whether a .05 or .001 etc is the most appropriate cutpoint for "significance" of the difference.
Thanks for your help, everybody.
Huji
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 8
  • 5
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now