haravallabhan
asked on
Performing Statistical analysis (kolmogorov smirnov test/related) using R or Matlab or Excel
I have two excel files one containing
a set of values of a markers with their positions (MARKERID-KS.xls) and
another with a set of values of regions with their positions on a map (MAPID-KS.xls), along with their color intensity.
Please see the attached excel files.
I want to do a statistical test to this to determine if the relationship of the distributions of these two datasets are significant or not ie Marker to Map(this in relation to the color intensity as given in the map). I want to do a Kolmogorov smirnov test, KS-Test (or please advice me if there is any other test that I need could use) on this sample to determine the D Value and to find if there is significance between the two datasets. Can someone here please advice me on to how I can perfom this. If, there a straight forward way to perform this test with the data test using R or Matlab (I havent used Matlab but could try).
Thanks
Cheers
Sample of the data
DATA1 : MARKERID
MarkerID Marker Beginposition EndPosition
MARK1 IC 6135 6046
MARK2 IC 22428 22440
MARK3 IC 23665 23684
MARK4 IC 30394 30402
MARK5 IC 30961 30964
MARK6 IC 33439 33455
MARK7 IC 36905 36906
MARK8 IC 42219 42234
DATA2:MAPID
MAPID REGION BEGIN END COLOR INTENSITY
COLID1 IC 60639 63226 5.89
COLID2 IC 42039 42259 5.47
COLID3 IC 42626 43386 5.2
COLID4 IC 63200 63369 4.52
COLID5 IC 30699 32083 3.97
COLID6 IC 66360 66555 3.9
MARKERID-KS.xls
MAPID-KS.xls
a set of values of a markers with their positions (MARKERID-KS.xls) and
another with a set of values of regions with their positions on a map (MAPID-KS.xls), along with their color intensity.
Please see the attached excel files.
I want to do a statistical test to this to determine if the relationship of the distributions of these two datasets are significant or not ie Marker to Map(this in relation to the color intensity as given in the map). I want to do a Kolmogorov smirnov test, KS-Test (or please advice me if there is any other test that I need could use) on this sample to determine the D Value and to find if there is significance between the two datasets. Can someone here please advice me on to how I can perfom this. If, there a straight forward way to perform this test with the data test using R or Matlab (I havent used Matlab but could try).
Thanks
Cheers
Sample of the data
DATA1 : MARKERID
MarkerID Marker Beginposition EndPosition
MARK1 IC 6135 6046
MARK2 IC 22428 22440
MARK3 IC 23665 23684
MARK4 IC 30394 30402
MARK5 IC 30961 30964
MARK6 IC 33439 33455
MARK7 IC 36905 36906
MARK8 IC 42219 42234
DATA2:MAPID
MAPID REGION BEGIN END COLOR INTENSITY
COLID1 IC 60639 63226 5.89
COLID2 IC 42039 42259 5.47
COLID3 IC 42626 43386 5.2
COLID4 IC 63200 63369 4.52
COLID5 IC 30699 32083 3.97
COLID6 IC 66360 66555 3.9
MARKERID-KS.xls
MAPID-KS.xls
ASKER
Hi,
You are absoultely right and thanks for clarifying. Because the Marker and Map positions have Begin and end positions, the data analysis has been challenging and I was hoping if there could be a way to do this analysis encompassing both the Begin and End position. I will get back in few hours explaining the data in detail.
Thanks for the inputs.
You are absoultely right and thanks for clarifying. Because the Marker and Map positions have Begin and end positions, the data analysis has been challenging and I was hoping if there could be a way to do this analysis encompassing both the Begin and End position. I will get back in few hours explaining the data in detail.
Thanks for the inputs.
ASKER
Hi,
Sorry to get back late.
What I am trying to test here is that if the markers positions have any significance in relation to the position on the map. We could ignore the colour intensity for now.
The questions I am trying to address are
1) Does these two distributions have any significance to each other ?
2) Does the marker position (either begin or end position) correlate to the Begin or end position of the Map ?
3) Considering that since both the distributions come from a random distribution, how do I establish that there is significance i.e how do I do it Using KS-Test and are there any other statistical test that can be performed to find out any significance between these two data ?
4) Given that these data are in excel files, how do I perform the test in R ? (I lost touch with R for sometime)
I will be happy to even consider using just one column i.e say end position of Marker and end position of Map to get these statistics. Additionally I have the distance of the begin position of Marker to the end position of Map ( which is what is my concern), to establish if there is any significance in its distribution/ pattern and if there is a correlation between these two data.
I am unsure how to go about doing this statistically and was adviced by someone to try and do a KS-test. Could you give some light on this ?
Sorry to get back late.
What I am trying to test here is that if the markers positions have any significance in relation to the position on the map. We could ignore the colour intensity for now.
The questions I am trying to address are
1) Does these two distributions have any significance to each other ?
2) Does the marker position (either begin or end position) correlate to the Begin or end position of the Map ?
3) Considering that since both the distributions come from a random distribution, how do I establish that there is significance i.e how do I do it Using KS-Test and are there any other statistical test that can be performed to find out any significance between these two data ?
4) Given that these data are in excel files, how do I perform the test in R ? (I lost touch with R for sometime)
I will be happy to even consider using just one column i.e say end position of Marker and end position of Map to get these statistics. Additionally I have the distance of the begin position of Marker to the end position of Map ( which is what is my concern), to establish if there is any significance in its distribution/ pattern and if there is a correlation between these two data.
I am unsure how to go about doing this statistically and was adviced by someone to try and do a KS-test. Could you give some light on this ?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thank you very much. This is much clear. The data that was in the file is not the complete and real data, I made this up to get clarity in my analysis. Can I please ask you how you generated this graph i.e which software did you use , is this MATLAB?
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
I do not understand your data. The test you are referring to compares two distributions: which value constitutes the distribution you want to test? Apparently, it can be either “begin position” or “end position”, but not both in the same test. The test cannot be “in relation to the colour intensity”, because that value is missing for markers.
Please explain a little more about the data and what exactly you are trying to test, not necessarily using statistical terms.
(°v°)