Frutasamir
asked on
Difference between two distributions
I have two tables of temperature destribution;
And I want to know if these two distributions are significally different;
Therefore I want to apply the Kolmogorov-Smirnov Test for two distributions,
can anyone adapt the test for my needs?
thanks,
see my attached code
Regi-es.pas
And I want to know if these two distributions are significally different;
Therefore I want to apply the Kolmogorov-Smirnov Test for two distributions,
can anyone adapt the test for my needs?
thanks,
see my attached code
Regi-es.pas
ASKER
I'd like to implement that on code;
For a project to run, and answer the user if the datasets are the same or not (with 95% confidence interval);
For a project to run, and answer the user if the datasets are the same or not (with 95% confidence interval);
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Drawing the line worked quite well,
But how would I implement a way to use the maximum distance between these distributions, and the code, for example a dialogbox consider the overall difference significant. Given the confidence interval.
But how would I implement a way to use the maximum distance between these distributions, and the code, for example a dialogbox consider the overall difference significant. Given the confidence interval.
ASKER
Hello sinisav,
this is what I'm trying to apply on the function,
given the distributions and the max distance between them, and introducing a confidence interval. The distributions are considered different or virtually the same....
KS-Two.jpg
this is what I'm trying to apply on the function,
given the distributions and the max distance between them, and introducing a confidence interval. The distributions are considered different or virtually the same....
KS-Two.jpg
If we assume that your cumulative frequency distribution is probability distribution from KS test example - what is missing here? My function give the maximum difference of two distributions. I think that is wrong a way which you want to use to accomplish your main thoughts. KS difference can show you that something is wrong, but where - I think this is your problem. Your example images differs in size and position compares to original one, so it can be used just for detection that something is wrong (average temperature is higher then in original).
You must decide what is maximum value for difference between two distributions where you assume that distributions are "similar". This value is a trigger when you must assume that "something is wrong".
If you know what is wrong here, write a descriptive pseudo-code then we can help you more.
You must decide what is maximum value for difference between two distributions where you assume that distributions are "similar". This value is a trigger when you must assume that "something is wrong".
If you know what is wrong here, write a descriptive pseudo-code then we can help you more.
ASKER
hello sinisav,
there is a specific function to determine if the maximum distance between the cumulative distributions is considered different;
I'd like to implement that aswell;
please check the link below for the function and the two distribution Ks-Test chapter:
http://ocw.mit.edu/courses/mathematics/18-443-statistics-for-applications-fall-2006/lecture-notes/lecture14.pdf
Something like this code, but having problem to implement this, th probks funtion is essential to calculate the significance level:
there is a specific function to determine if the maximum distance between the cumulative distributions is considered different;
I'd like to implement that aswell;
please check the link below for the function and the two distribution Ks-Test chapter:
http://ocw.mit.edu/courses/mathematics/18-443-statistics-for-applications-fall-2006/lecture-notes/lecture14.pdf
Something like this code, but having problem to implement this, th probks funtion is essential to calculate the significance level:
Function probks(alam:double):double;
CONST
eps1 = 0.001;
eps2 = 1.0e-8;
Var
a2,fac,sum,term,termbf: real;
j:integer;
Begin
a2:=-2.0*alam*alam;
fac:=2.0;
sum:=0.0;
termbf:=0.0;
for j := 1 to 600 do Begin
term:=fac*exp(a2*sqr(j));
sum:=sum+term;
if (abs(term) <= eps1*termbf) OR (abs(term) <= eps2*sum) then begin
probks:=sum;
end
Else begin
fac:= -fac;
termbf:= abs(term)
end
end;
probks:=1.0;
End;
procedure kstwo (var data1:RealArrayN12; n1:integer; Var data2: RealArrayN12; n2:integer; Var d, prob:real);
VAR
i, j1, j2:integer;
en1, en2, fn1, fn2, dt, d1, d2: real;
Begin
sort(n1,data1);
sort(n2,data2);
en1:= n1;
en2:= n2;
j1:=1;
j2:=1;
fn1:=0.0;
fn2:=0.0;
d:=0.0;
WHILE (j1 <= n1) AND (j2 <= n2) DO BEGIN
d1 :=data1[j1];
d2:=data2[j2];
IF d1<=d2 then Begin
fn1:=j1/en1;
j1:=j1+1;
END;
IF d2<=d1 then Begin
fn2:=j2/en2;
j2:=j2+1;
END;
dt:= abs(fn2-fn1);
IF dt>d Then d:= dt
End;
prob := probks(sqrt(en1*en2/(en1+en2))*d)
End;
ks-test.jpg
ASKER
How would I implement if the maximum distance is, for example 1? Thanks
I am very busy lately, but in few day I will try to adopt these example. Sorry.
ASKER
ok, implement this function and procedure is very important
ASKER
got it working, nevermind the last questions.
thanks.
thanks.
Try this for more info on the test if you need it: http://www.princeton.edu/~achaney/tmve/wiki100k/docs/Kolmogorov-Smirnov_test.html
There is a Wikipedia article too, but I mind the math articles there to be a little overly complicated and hard to understand: http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test#Two-sample_Kolmogorov.E2.80.93Smirnov_test