Has a worked example...is that good enough?

Solved

Posted on 2006-04-05

I need the source code of Student T-test (statistical test to compare means of two samples with normal distribution) in any of the followin program languages: C++, C#, VB, PHP, J#, or Java.

Or if you can find it in other languages (like C, Fortran or even assembly!) I need it to translated to a language I understand.

I only need the source code for comapring two independent samples, for now.

Thanks

Huji

Or if you can find it in other languages (like C, Fortran or even assembly!) I need it to translated to a language I understand.

I only need the source code for comapring two independent samples, for now.

Thanks

Huji

17 Comments

Has a worked example...is that good enough?

I assume this is some sort of college assignment?

(Source: http://lib.stat.cmu.edu/ap

REAL FUNCTION STUDNT (T, DOFF, IFAULT)

C

C ALGORITHM AS 27 APPL. STATIST. VOL.19, NO.1

C

C Calculate the upper tail area under Student's t-distribution

C

C Translated from Algol by Alan Miller

C

INTEGER IFAULT

REAL T, DOFF

C

C Local variables

C

REAL V, X, TT, TWO, FOUR, ONE, ZERO, HALF

REAL A1, A2, A3, A4, A5, B1, B2,

* C1, C2, C3, C4, C5, D1, D2,

* E1, E2, E3, E4, E5, F1, F2,

* G1, G2, G3, G4, G5, H1, H2,

* I1, I2, I3, I4, I5, J1, J2

LOGICAL POS

DATA TWO /2.0/, FOUR /4.0/, ONE /1.0/, ZERO /0.0/, HALF /0.5/

DATA A1, A2, A3, A4, A5 /0.09979441, -0.581821, 1.390993,

* -1.222452, 2.151185/, B1, B2 /5.537409, 11.42343/

DATA C1, C2, C3, C4, C5 /0.04431742, -0.2206018, -0.03317253,

* 5.679969, -12.96519/, D1, D2 /5.166733, 13.49862/

DATA E1, E2, E3, E4, E5 /0.009694901, -0.1408854, 1.88993,

* -12.75532, 25.77532/, F1, F2 /4.233736, 14.3963/

DATA G1, G2, G3, G4, G5 /-9.187228E-5, 0.03789901, -1.280346,

* 9.249528, -19.08115/, H1, H2 /2.777816, 16.46132/

DATA I1, I2, I3, I4, I5 /5.79602E-4, -0.02763334, 0.4517029,

* -2.657697, 5.127212/, J1, J2 /0.5657187, 21.83269/

C

C Check that number of degrees of freedom > 4.

C

IF (DOFF .LT. TWO) THEN

IFAULT = 1

STUDNT = - ONE

RETURN

END IF

C

IF (DOFF .LE. FOUR) THEN

IFAULT = DOFF

ELSE

IFAULT = 0

END IF

C

C Evaluate series.

C

V = ONE / DOFF

POS = (T .GE. ZERO)

TT = ABS(T)

X = HALF * (ONE +

* TT * (((A1 + V * (A2 + V * (A3 + V * (A4 + V * A5)))) /

* (ONE - V * (B1 - V * B2))) +

* TT * (((C1 + V * (C2 + V * (C3 + V * (C4 + V * C5)))) /

* (ONE - V * (D1 - V * D2))) +

* TT * (((E1 + V * (E2 + V * (E3 + V * (E4 + V * E5)))) /

* (ONE - V * (F1 - V * F2))) +

* TT * (((G1 + V * (G2 + V * (G3 + V * (G4 + V * G5)))) /

* (ONE - V * (H1 - V * H2))) +

* TT * ((I1 + V * (I2 + V * (I3 + V * (I4 + V * I5)))) /

* (ONE - V * (J1 - V * J2))) ))))) ** (-8)

IF (POS) THEN

STUDNT = X

ELSE

STUDNT = ONE - X

END IF

C

RETURN

END

And about what appears on the top section of that page: I know enough statistics to do that myself! *L* And there is a big problem: It checks the "t" statistic against a table, which is not that accurate, and this method will only result in a true/false output (either statistically significant or non-significant) while I need to get the p-value.

Well the code is nearly what I want. I'm not sure if I really can translate it to, for example, Visual Basic, without any mistakes.

Thanks

Huji

(Any history buffs out there?)

I am looking for other versions that may not require translation.

http://lib.stat.cmu.edu/ge

Hope this helps.

-- tobydavid

int tprob(double t, int dof, double *probability)

There is a ONEBYPI #defined and #defined later. Looking at the code inside, it is almost what I have done to calculate the p-value when "t" statistic is given. I'm happy it has given a direct reference for the Fortran code: Applied Statistics (1968) Vol. 17, p. 189.

I need to study more on this I think.

Huji

By chance, I have a prog. for dealing with some different statistics (skewness and kurtosis), which works off tables via interpolation using Aitken's method. For your case, you'd modify the table to correspond to your statistic, and use Aitken's method to interpolate as needed for the specific value of your parameter. Note that my parameter involves N which might be arbitrarily large, thus I use the reciprocal, 1/N so that infinity becomes the easily accessible value, zero in my interpolations.

FUNCTION AITKEN(N,X,Y,V)

Calculate a value Y(V), given X and Y, by Aitken's method.

Cherish hopes that V is within the range of the X's (in any order).

Caution! There is little sense in going much beyond n = 5.

REAL X(*),Y(*),V,D(24),T(24)

M = MIN(N,24)

DO 1 I = 1,M

D(I) = X(I) - V

1 T(I) = Y(I)

DO 2 I = 2,M

DO 2 J = I,M

2 T(J) = (T(I - 1)*D(J) - T(J)*D(I - 1))/(X(J) - X(I - 1))

AITKEN = T(M)

RETURN

END

SUBROUTINE STATS2(NA,A,AV,SD,SKEW,KUR

PARAMETER (NTABLE = 28)

Compute skewness and kurtosis and decide upon a suitable rave.

LOGICAL SKIP0

REAL A(*)

DOUBLE PRECISION AV,SD,SKEW,KURT

DOUBLE PRECISION D,D2,EV2,EV3,EV4

CHARACTER*(*) RAVE

CHARACTER*36 R1,R2

REAL RTABLE(NTABLE),SKEW5(NTABL

Critical values of 5% tails from the Biometrica Tables, V1,p183-4, 1958.

DATA RTABLE/

1 7, 10, 15, 20, 25, 30, 35, 40, 45, 50,

2 70, 75, 100, 125, 150, 175, 200, 250, 300, 400,

3 500, 700, 1000, 2000, 3000, 4000, 5000, -666/

Corresponding 1/R: .00100 50 33 25 20 0

DATA SKEW5/

1 1.008, .950, .862, .777, .714, .664, .624, .587, .558, .534,

2 .459, .445, .389, .350, .321, .298, .280, .251, .230, .200,

3 .179, .151, .127, .090, .073, .064, .057, 0 /

DATA UKURT5/

1 3.55, 3.95, 4.13, 4.17, 4.16, 4.11, 4.10, 4.06, 4.00, 3.99,

2 3.88, 3.87, 3.77, 3.71, 3.65, 3.61, 3.57, 3.52, 3.47, 3.41,

3 3.37, 3.31, 3.26, 3.18, 3.15, 3.13, 3.12, 3 /

DATA LKURT5/

1 1.41, 1.56, 1.72, 1.82, 1.91, 1.98, 2.03, 2.07, 2.11, 2.15,

2 2.25, 2.27, 2.35, 2.40, 2.45, 2.48, 2.51, 2.55, 2.59, 2.64,

3 2.67, 2.72, 2.76, 2.83, 2.86, 2.88, 2.89, 3 /

Can't expect a polynomial to follow an asymptote, so take a reciprocal.

IF (RTABLE(NTABLE) .NE. -666) GO TO 10

DO 1 I = 1,NTABLE - 1

1 RTABLE(I) = 1/RTABLE(I)

Catch n = infinity by taking its reciprocal as zero...

RTABLE(NTABLE) = 0

Calculate the parameters, assuming AV already has the average.

10 EV2 = 0.0

EV3 = 0.0

EV4 = 0.0

N = 0

DO 11 I = 1,NA

V = A(I)

IF (SKIP0 .AND. V .EQ. 0.0) GO TO 11

N = N + 1

D = V - AV

D2 = D*D

EV2 = EV2 + D2

EV3 = EV3 + D2*D

EV4 = EV4 + D2*D2

11 CONTINUE

SD = SQRT(EV2/N)

SKEW = 0

KURT = 0

IF (SD .GT. 0) THEN

SKEW = N*EV3**2 /EV2**3

IF (EV3 .LT. 0.0) SKEW = -SKEW

KURT = N*EV4/EV2**2

ENDIF

Chase after the first table value below n, in reciprocals.

20 R = 1.0/N

DO 21 I = NTABLE,1,-1

IF (R.LE.RTABLE(I)) GO TO 22

21 CONTINUE

22 I1 = MAX(1,I - 1)

I2 = MIN(NTABLE,I + 2)

NI = I2 - I1 + 1

Carefully extend the range of entries that straddle the unknown value.

23 IF (NI .LT. 4) I1 = MAX(1,I1 - 1)

NI = I2 - I1 + 1

IF (NI .LT. 4) I2 = MIN(NTABLE,I2 + 1)

NI = I2 - I1 + 1

IF (NI .LT. 4) GO TO 23

Concoct some 5% decision points. NB: the table has SQRT(skew)

30 AS = AITKEN(NI,RTABLE(I1),SKEW5

AKL = AITKEN(NI,RTABLE(I1),LKURT

AKU = AITKEN(NI,RTABLE(I1),UKURT

Collect the words of wonder. 5% tails on each side.

R1 = 'not significantly skewed'

IF (SKEW.LT.-AS) R1 = 'low-value tailed'

IF (SKEW.GT. AS) R1 = 'high-value tailed'

R2 = 'mesokurtic'

IF (KURT.LT. AKL) R2 = 'platykurtic'

IF (KURT.GT. AKU) R2 = 'leptokurtic'

L1 = LASTNB(R1)

L2 = LASTNB(R2)

RAVE = R1(1:L1) // ', ' // R2(1:L2)

RETURN

END

I tested it for a variety of examples and they all matched the table at the back of my Stat text, so I am pretty confident it is a valid routine. Here is my little console program that tests it. In this example, I use a T-score of 2.797 and 25 degrees of freedom which my book shows as a probability of 0.005. The program return 0.0049933.

#include <iostream.h>

#include <math.h>

double tprob(double t, int dof)

/* Arguments:

t : wanna evaluate integral from t to +infinity.

dof: degrees of freedom of this t distribution.

probability : computed value might be returned.

function returns error code 1 if something is wrong,

0 if all is well.

This functional interface is not exactly what AS 3 used to be.

*/

#define ONEBYPI ((double) 0.318309886183790671537767

/* Abramowitz & Stegun, of course. */

{

double d_dof, s, c, f, a, b, probability;

int fk, ks, im2, ioe, k;

if (dof < 1) return 999999;

d_dof = (double) dof; /* d_dof is F of fortran code. */

a = t / sqrt(d_dof);

b = d_dof/(d_dof + (t*t));

im2 = dof - 2;

ioe = dof % 2;

s = c = f = 1.0;

fk = ks = 2 + ioe;

if (im2 > 2)

for (k=ks; k<=im2; k+=2) {

c = c*b*(fk - 1.0)/fk;

s += c;

if (s == f) break; /* == ? */

f = s;

fk += 2.0;

}

if (ioe != 1) { /* Label 20 of fortran code. */

probability = 0.5 + (0.5*a*sqrt(b)*s);

return 1-probability;

}

else { /* Label 30 of fortran code. */

if (dof == 1) s = 0.0;

probability = 0.5 + ((a*b*s + atan(a))*ONEBYPI);

return 1-probability;

}

}

#undef ONEBYPI

int main()

{

cout << tprob(2.797,24) <<

endl;

return 0;

}

You have done so much help in this regard. I'm soooooooo thankful.

I think I understand the code. I have accessed the original article explaining the Fortran code (the reference of which I pasted above) and I'm confident it is accurate enough.

@RNMcLean

Although Aitken's method is a suitable option, in my case, the function provided there is more accurate.

@all

I feel the lack of an open source .Net statistics component in my searches. Don't you know one? Lets go and start one if that is the case!

Huji

Regards,

-- tobydavid

Incidentally, in my own use of such tests, rather than issue the boiler plate "exceeds/does not exceed the 5% (or 1% or whatever) level" or even more jargonlike "significant at the 1% level" and the like, I prefer to report something such as "...the estimated probability of the effect being due to chance variation is 3%" (or whatever it is), via interpolation with the right number of degrees of freedom around the parameter value. The reader can then make decisions as to 1% level, or 10% level, as pleases. The precise phrasing would depend on just what you were testing, but why not report all the facts ("Just the facts, ma'am"), unless of course this is a part of a larger computer programme analysing thousands of cases.

You are very correct. And that is just what I will be doing (and have been doing) indeed; to report the p-value and let the reader think whether a .05 or .001 etc is the most appropriate cutpoint for "significance" of the difference.

Thanks for your help, everybody.

Huji

By clicking you are agreeing to Experts Exchange's Terms of Use.

Title | # Comments | Views | Activity |
---|---|---|---|

copyEvens challenge | 6 | 46 | |

countHi challenge | 25 | 59 | |

array11 challenge | 16 | 40 | |

Maximum possible 10 character string combinations from 36 unique characters | 18 | 29 |

A short article about a problem I had getting the GPS LocationListener working.

Join the community of 500,000 technology professionals and ask your questions.

Connect with top rated Experts

**7** Experts available now in Live!