We help IT Professionals succeed at work.

# T-test

on
Medium Priority
725 Views
I need the source code of Student T-test (statistical test to compare means of two samples with normal distribution) in any of the followin program languages: C++, C#, VB, PHP, J#, or Java.
Or if you can find it in other languages (like C, Fortran or even assembly!) I need it to translated to a language I understand.
I only need the source code for comapring two independent samples, for now.
Thanks
Huji
Comment
Watch Question

## View Solutions Only

CERTIFIED EXPERT
Top Expert 2004

Commented:

Commented:
Sorry but no. That worked example uses Excel. I need the source code to code my own program, which should not be Excel-dependent.
CERTIFIED EXPERT
Top Expert 2004

Commented:
Fair enough, but if you look at the worked example above the excel one, it shows you what to do at each step of the way...

I assume this is some sort of college assignment?
Commented:
I found this algorithm that looks easy enough to translate to your language of choice.
(Source: http://lib.stat.cmu.edu/apstat/ )

REAL FUNCTION STUDNT (T, DOFF, IFAULT)
C
C     ALGORITHM AS 27  APPL. STATIST. VOL.19, NO.1
C
C     Calculate the upper tail area under Student's t-distribution
C
C     Translated from Algol by Alan Miller
C
INTEGER IFAULT
REAL T, DOFF
C
C     Local variables
C
REAL V, X, TT, TWO, FOUR, ONE, ZERO, HALF
REAL A1, A2, A3, A4, A5, B1, B2,
*     C1, C2, C3, C4, C5, D1, D2,
*     E1, E2, E3, E4, E5, F1, F2,
*     G1, G2, G3, G4, G5, H1, H2,
*     I1, I2, I3, I4, I5, J1, J2
LOGICAL POS
DATA TWO /2.0/, FOUR /4.0/, ONE /1.0/, ZERO /0.0/, HALF /0.5/
DATA A1, A2, A3, A4, A5 /0.09979441, -0.581821, 1.390993,
*     -1.222452, 2.151185/, B1, B2 /5.537409, 11.42343/
DATA C1, C2, C3, C4, C5 /0.04431742, -0.2206018, -0.03317253,
*     5.679969, -12.96519/, D1, D2 /5.166733, 13.49862/
DATA E1, E2, E3, E4, E5 /0.009694901, -0.1408854, 1.88993,
*     -12.75532, 25.77532/, F1, F2 /4.233736, 14.3963/
DATA G1, G2, G3, G4, G5 /-9.187228E-5, 0.03789901, -1.280346,
*     9.249528, -19.08115/, H1, H2 /2.777816, 16.46132/
DATA I1, I2, I3, I4, I5 /5.79602E-4, -0.02763334, 0.4517029,
*     -2.657697, 5.127212/, J1, J2 /0.5657187, 21.83269/
C
C     Check that number of degrees of freedom > 4.
C
IF (DOFF .LT. TWO) THEN
IFAULT = 1
STUDNT = - ONE
RETURN
END IF
C
IF (DOFF .LE. FOUR) THEN
IFAULT = DOFF
ELSE
IFAULT = 0
END IF
C
C     Evaluate series.
C
V = ONE / DOFF
POS = (T .GE. ZERO)
TT = ABS(T)
X = HALF * (ONE +
*    TT * (((A1 + V * (A2 + V * (A3 + V * (A4 + V * A5)))) /
*        (ONE - V * (B1 - V * B2))) +
*    TT * (((C1 + V * (C2 + V * (C3 + V * (C4 + V * C5)))) /
*        (ONE - V * (D1 - V * D2))) +
*    TT * (((E1 + V * (E2 + V * (E3 + V * (E4 + V * E5)))) /
*        (ONE - V * (F1 - V * F2))) +
*    TT * (((G1 + V * (G2 + V * (G3 + V * (G4 + V * G5)))) /
*        (ONE - V * (H1 - V * H2))) +
*    TT * ((I1 + V * (I2 + V * (I3 + V * (I4 + V * I5)))) /
*        (ONE - V * (J1 - V * J2))) ))))) ** (-8)
IF (POS) THEN
STUDNT = X
ELSE
STUDNT = ONE - X
END IF
C
RETURN
END

Not the solution you were looking for? Getting a personalized solution is easy.

Commented:
Nope this is not. This is a research project I'm carrying on. And I need the available source codes to compare with our current version and do some fixes.
And about what appears on the top section of that page: I know enough statistics to do that myself! *L* And there is a big problem: It checks the "t" statistic against a table, which is not that accurate, and this method will only result in a true/false output (either statistically significant or non-significant) while I need to get the p-value.

Commented:
@tobydavid:
Well the code is nearly what I want. I'm not sure if I really can translate it to, for example, Visual Basic, without any mistakes.
Thanks
Huji

Commented:
By the way, what language is that?

Commented:
I think it may be Fortran-77.

(Any history buffs out there?)

I am looking for other versions that may not require translation.

Commented:
An open source .Net statistics component can be a very worthful answer. :o)

Commented:
I found where someone had translated the STATLAB code into C++

http://lib.stat.cmu.edu/general/gaut.c

Hope this helps.

-- tobydavid

Commented:
The more I studies that file, the more I got certain that I can't understand it! I have previously coded in C++, but I still can't see where the code for this function is:
int tprob(double t, int dof, double *probability)
There is a ONEBYPI #defined and #defined later. Looking at the code inside, it is almost what I have done to calculate the p-value when "t" statistic is given. I'm happy it has given a direct reference for the Fortran code: Applied Statistics (1968) Vol. 17, p. 189.
I need to study more on this I think.
Huji
Commented:
The code is indeed Fortran, and the idiot who coded it hasn't bothered (or didn't know how) to use arrays, thus the tedious appearance of A1, A2, ... A5 rather than an array A, elements A(1) to A(5), and likewise C and so forth which could then be indexed appropriately rather than use tediously long and complex expressions.
By chance, I have a prog. for dealing with some different statistics (skewness and kurtosis), which works off tables via interpolation using Aitken's method. For your case, you'd modify the table to correspond to your statistic, and use Aitken's method to interpolate as needed for the specific value of your parameter. Note that my parameter involves N which might be arbitrarily large, thus I use the reciprocal, 1/N so that infinity becomes the easily accessible value, zero in my interpolations.

FUNCTION AITKEN(N,X,Y,V)
Calculate a value Y(V), given X and Y, by Aitken's method.
Cherish hopes that V is within the range of the X's (in any order).
Caution! There is little sense in going much beyond n = 5.
REAL X(*),Y(*),V,D(24),T(24)
M = MIN(N,24)
DO 1 I = 1,M
D(I) = X(I) - V
1     T(I) = Y(I)
DO 2 I = 2,M
DO 2 J = I,M
2       T(J) = (T(I - 1)*D(J) - T(J)*D(I - 1))/(X(J) - X(I - 1))
AITKEN = T(M)
RETURN
END
SUBROUTINE STATS2(NA,A,AV,SD,SKEW,KURT,SKIP0,RAVE)
PARAMETER (NTABLE = 28)
Compute skewness and kurtosis and decide upon a suitable rave.
LOGICAL SKIP0
REAL A(*)
DOUBLE PRECISION AV,SD,SKEW,KURT
DOUBLE PRECISION D,D2,EV2,EV3,EV4
CHARACTER*(*) RAVE
CHARACTER*36 R1,R2
REAL RTABLE(NTABLE),SKEW5(NTABLE),UKURT5(NTABLE),LKURT5(NTABLE)
Critical values of 5% tails from the Biometrica Tables, V1,p183-4, 1958.
DATA RTABLE/
1      7,   10,   15,   20,   25,   30,   35,   40,   45,   50,
2     70,   75,  100,  125,  150,  175,  200,  250,  300,  400,
3    500,  700, 1000, 2000, 3000, 4000, 5000, -666/
Corresponding 1/R:   .00100  50    33    25    20     0
DATA SKEW5/
1  1.008, .950, .862, .777, .714, .664, .624, .587, .558, .534,
2   .459, .445, .389, .350, .321, .298, .280, .251, .230, .200,
3   .179, .151, .127, .090, .073, .064, .057,   0 /
DATA UKURT5/
1   3.55, 3.95, 4.13, 4.17, 4.16, 4.11, 4.10, 4.06, 4.00, 3.99,
2   3.88, 3.87, 3.77, 3.71, 3.65, 3.61, 3.57, 3.52, 3.47, 3.41,
3   3.37, 3.31, 3.26, 3.18, 3.15, 3.13, 3.12,   3 /
DATA LKURT5/
1   1.41, 1.56, 1.72, 1.82, 1.91, 1.98, 2.03, 2.07, 2.11, 2.15,
2   2.25, 2.27, 2.35, 2.40, 2.45, 2.48, 2.51, 2.55, 2.59, 2.64,
3   2.67, 2.72, 2.76, 2.83, 2.86, 2.88, 2.89,   3 /
Can't expect a polynomial to follow an asymptote, so take a reciprocal.
IF (RTABLE(NTABLE) .NE. -666) GO TO 10
DO 1 I = 1,NTABLE - 1
1     RTABLE(I) = 1/RTABLE(I)
Catch n = infinity by taking its reciprocal as zero...
RTABLE(NTABLE) = 0

Calculate the parameters, assuming AV already has the average.
10   EV2 = 0.0
EV3 = 0.0
EV4 = 0.0
N = 0
DO 11 I = 1,NA
V = A(I)
IF (SKIP0 .AND. V .EQ. 0.0) GO TO 11
N = N + 1
D = V - AV
D2 = D*D
EV2 = EV2 + D2
EV3 = EV3 + D2*D
EV4 = EV4 + D2*D2
11     CONTINUE
SD = SQRT(EV2/N)
SKEW = 0
KURT = 0
IF (SD .GT. 0) THEN
SKEW = N*EV3**2 /EV2**3
IF (EV3 .LT. 0.0) SKEW = -SKEW
KURT = N*EV4/EV2**2
ENDIF

Chase after the first table value below n, in reciprocals.
20   R = 1.0/N
DO 21 I = NTABLE,1,-1
IF (R.LE.RTABLE(I)) GO TO 22
21     CONTINUE
22   I1 = MAX(1,I - 1)
I2 = MIN(NTABLE,I + 2)
NI = I2 - I1 + 1
Carefully extend the range of entries that straddle the unknown value.
23   IF (NI .LT. 4) I1 = MAX(1,I1 - 1)
NI = I2 - I1 + 1
IF (NI .LT. 4) I2 = MIN(NTABLE,I2 + 1)
NI = I2 - I1 + 1
IF (NI .LT. 4) GO TO 23
Concoct some 5% decision points. NB: the table has SQRT(skew)
30   AS  = AITKEN(NI,RTABLE(I1),SKEW5(I1),R)**2
AKL = AITKEN(NI,RTABLE(I1),LKURT5(I1),R)
AKU = AITKEN(NI,RTABLE(I1),UKURT5(I1),R)
Collect the words of wonder. 5% tails on each side.
R1 = 'not significantly skewed'
IF (SKEW.LT.-AS) R1 = 'low-value tailed'
IF (SKEW.GT. AS) R1 = 'high-value tailed'
R2 = 'mesokurtic'
IF (KURT.LT. AKL) R2 = 'platykurtic'
IF (KURT.GT. AKU) R2 = 'leptokurtic'
L1 = LASTNB(R1)
L2 = LASTNB(R2)
RAVE = R1(1:L1) // ', ' // R2(1:L2)
RETURN
END
Commented:
I am not a C++ programmer, but I did play around with the C++ code.  First, I have the function actually return the probability rather than having it passed as the third parameter.  For the one possible error, namely less than 1 degree of freedom, it returns 999999.  Second, I changed it from a lower tail probability to an upper tail which I think is more commonly expected.  (The area under one tail is just 1 minus the area under the other.)

I tested it for a variety of examples and they all matched the table at the back of my Stat text, so I am pretty confident it is a valid routine.  Here is my little console program that tests it.  In this example, I use a T-score of 2.797 and 25 degrees of freedom which my book shows as a probability of 0.005.  The program return 0.0049933.

#include <iostream.h>
#include <math.h>
double tprob(double t, int dof)

/* Arguments:
t : wanna evaluate integral from t to +infinity.
dof: degrees of freedom of this t distribution.
probability : computed value might be returned.

function returns error code 1 if something is wrong,
0 if all is well.

This functional interface is not exactly what AS 3 used to be.
*/

#define ONEBYPI ((double) 0.3183098861837906715377675)
/* Abramowitz & Stegun, of course. */
{
double d_dof, s, c, f, a, b, probability;
int fk, ks, im2, ioe, k;

if (dof < 1) return 999999;
d_dof = (double) dof;      /* d_dof is F of fortran code. */

a = t / sqrt(d_dof);
b = d_dof/(d_dof + (t*t));
im2 = dof - 2;
ioe = dof % 2;
s = c = f = 1.0;
fk = ks = 2 + ioe;
if (im2 > 2)
for (k=ks; k<=im2; k+=2) {
c = c*b*(fk - 1.0)/fk;
s += c;
if (s == f) break;      /* == ? */
f = s;
fk += 2.0;
}
if (ioe != 1) {      /* Label 20 of fortran code. */
probability = 0.5 + (0.5*a*sqrt(b)*s);
return 1-probability;
}
else {      /* Label 30 of fortran code. */
if (dof == 1) s = 0.0;
probability = 0.5 + ((a*b*s + atan(a))*ONEBYPI);
return 1-probability;
}
}

#undef ONEBYPI

int main()
{

cout << tprob(2.797,24) <<
endl;
return 0;
}

Commented:
@tobydavid
You have done so much help in this regard. I'm soooooooo thankful.
I think I understand the code. I have accessed the original article explaining the Fortran code (the reference of which I pasted above) and I'm confident it is accurate enough.
@RNMcLean
Although Aitken's method is a suitable option, in my case, the function provided there is more accurate.

@all
I feel the lack of an open source .Net statistics component in my searches. Don't you know one? Lets go and start one if that is the case!

Huji

Commented:
You are more than welcome.  I learned (and relearned) a few things in the process of helping you.

Regards,

-- tobydavid

Commented:
No worries. I also revived some recollections.
Incidentally, in my own use of such tests, rather than issue the boiler plate "exceeds/does not exceed the 5% (or 1% or whatever) level" or even more jargonlike "significant at the 1% level" and the like, I prefer to report something such as "...the estimated probability of the effect being due to chance variation is 3%" (or whatever it is), via interpolation with the right number of degrees of freedom around the parameter value. The reader can then make decisions as to 1% level, or 10% level, as pleases. The precise phrasing would depend on just what you were testing, but why not report all the facts ("Just the facts, ma'am"), unless of course this is a part of a larger computer programme analysing thousands of cases.

Commented:
>> ...but why not report all the facts...
You are very correct. And that is just what I will be doing (and have been doing) indeed; to report the p-value and let the reader think whether a .05 or .001 etc is the most appropriate cutpoint for "significance" of the difference.
Huji
##### Thanks for using Experts Exchange.

• View three pieces of content (articles, solutions, posts, and videos)
• Ask the experts questions (counted toward content limit)
• Customize your dashboard and profile