Link to home
Start Free TrialLog in
Avatar of Frutasamir
Frutasamir

asked on

Difference between two distributions

I have two tables of temperature destribution;

And I want to know if these two distributions are significally different;

Therefore I want to apply the Kolmogorov-Smirnov Test for two distributions,

can anyone adapt the test for my needs?

thanks,

see my attached code
Regi-es.pas
Avatar of TommySzalapski
TommySzalapski
Flag of United States of America image

I don't see why you would need to adapt the test, it does exactly what you are wanting to do. The Two-sample Kolmogorov-Smirnov Test will tell you if it is likely that both datasets are from the same distribution. You can plug your data in here to see the numbers (and to check your results if you still want to code it).  http://www.physics.csbsju.edu/stats/KS-test.html

Try this for more info on the test if you need it: http://www.princeton.edu/~achaney/tmve/wiki100k/docs/Kolmogorov-Smirnov_test.html

There is a Wikipedia article too, but I mind the math articles there to be a little overly complicated and hard to understand: http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test#Two-sample_Kolmogorov.E2.80.93Smirnov_test
Avatar of Frutasamir
Frutasamir

ASKER

I'd like to implement that on code;

For a project to run, and answer the user if the datasets are the same or not (with 95% confidence interval);
ASKER CERTIFIED SOLUTION
Avatar of Sinisa Vuk
Sinisa Vuk
Flag of Croatia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Drawing the line worked quite well,

But how would I implement a way to use the maximum distance between these distributions, and the code, for example a dialogbox consider the overall difference significant. Given the confidence interval.
Hello sinisav,

this is what I'm trying to apply on the function,

given the distributions and the max distance between them, and introducing a confidence interval. The distributions are considered different or virtually the same....
KS-Two.jpg
If we assume that your cumulative frequency distribution is probability distribution from KS test example - what is missing here? My function give the maximum difference of two distributions. I think that is wrong a way which you want to use to accomplish your main thoughts. KS difference can show you that something is wrong, but where - I think this is your problem. Your example images differs in size and position compares to original one, so it can be used just for detection that something is wrong (average temperature is higher then in original).
You must decide what is maximum value for difference between two distributions where you assume that distributions are "similar". This value is a trigger when you must assume that "something is wrong".
If you know what is wrong here, write a descriptive pseudo-code then we can help you more.
hello sinisav,

there is a specific function to determine if the maximum distance between the cumulative distributions is considered different;

I'd like to implement that aswell;

please check the link below for the function and the two distribution Ks-Test chapter:

http://ocw.mit.edu/courses/mathematics/18-443-statistics-for-applications-fall-2006/lecture-notes/lecture14.pdf

Something like this code, but having problem to implement this, th probks funtion is essential to calculate the significance level:

Function probks(alam:double):double;

CONST
eps1 = 0.001;
eps2 = 1.0e-8;

Var
a2,fac,sum,term,termbf: real;
j:integer;

Begin
a2:=-2.0*alam*alam;
fac:=2.0;
sum:=0.0;
termbf:=0.0;

for j := 1 to 600 do Begin
term:=fac*exp(a2*sqr(j));
sum:=sum+term;
if (abs(term) <= eps1*termbf) OR (abs(term) <= eps2*sum) then begin
  probks:=sum;

end
Else begin
fac:= -fac;
termbf:= abs(term)
end
end;
probks:=1.0;

End;

procedure kstwo (var data1:RealArrayN12; n1:integer; Var data2: RealArrayN12; n2:integer; Var d, prob:real);

VAR
i, j1, j2:integer;
en1, en2, fn1, fn2, dt, d1, d2: real;
Begin

sort(n1,data1);
sort(n2,data2);

en1:= n1;
en2:= n2;
j1:=1;
j2:=1;
fn1:=0.0;
fn2:=0.0;
d:=0.0;

WHILE (j1 <= n1) AND (j2 <= n2) DO BEGIN
d1 :=data1[j1];
d2:=data2[j2];

IF d1<=d2 then Begin
fn1:=j1/en1;
j1:=j1+1;
END;

IF d2<=d1 then Begin
fn2:=j2/en2;
j2:=j2+1;
END;

dt:= abs(fn2-fn1);

IF dt>d Then d:= dt
End;

prob := probks(sqrt(en1*en2/(en1+en2))*d)
End;

Open in new window

ks-test.jpg
How would I implement if the maximum distance is, for example 1? Thanks
I am very busy lately, but in few day I will try to adopt these example. Sorry.
ok, implement this function and procedure is very important
got it working, nevermind the last questions.

thanks.