asked on

Performing Statistical analysis (kolmogorov smirnov test/related) using R or Matlab or Excel

I have two excel files one containing
a set of values of a markers with their positions (MARKERID-KS.xls) and
another with a set of values of regions with their positions on a map (MAPID-KS.xls), along with their color intensity.
Please see the attached excel files.

I want to do a statistical test to this to determine if the relationship of the distributions of these two datasets are significant or not ie Marker to Map(this in relation to the color intensity as given in the map). I want to do a Kolmogorov smirnov test, KS-Test (or please advice me if there is any other test that I need could use) on this sample to determine the D Value and to find if there is significance between the two datasets. Can someone here please advice me on to how I can perfom this. If, there a straight forward way to perform this test with the data test using R or Matlab (I havent used Matlab but could try).

Thanks

Cheers

Sample of the data
DATA1 : MARKERID
MarkerID      Marker      Beginposition      EndPosition
MARK1      IC      6135      6046
MARK2      IC      22428      22440
MARK3      IC      23665      23684
MARK4      IC      30394      30402
MARK5      IC      30961      30964
MARK6      IC      33439      33455
MARK7      IC      36905      36906
MARK8      IC      42219      42234

DATA2:MAPID
MAPID      REGION      BEGIN      END      COLOR INTENSITY
COLID1      IC      60639      63226      5.89
COLID2      IC      42039      42259      5.47
COLID3      IC      42626      43386      5.2
COLID4      IC      63200      63369      4.52
COLID5      IC      30699      32083      3.97
COLID6      IC      66360      66555      3.9

MARKERID-KS.xls
MAPID-KS.xls

Markus Fischer

Hi,

I do not understand your data. The test you are referring to compares two distributions: which value constitutes the distribution you want to test? Apparently, it can be either “begin position” or “end position”, but not both in the same test. The test cannot be “in relation to the colour intensity”, because that value is missing for markers.

Please explain a little more about the data and what exactly you are trying to test, not necessarily using statistical terms.

(°v°)

haravallabhan

ASKER

Hi,

You are absoultely right and thanks for clarifying. Because the Marker and Map positions have Begin and end positions, the data analysis has been challenging and I was hoping if there could be a way to do this analysis encompassing both the Begin and End position. I will get back in few hours explaining the data in detail.

Thanks for the inputs.

haravallabhan

ASKER

Hi,

Sorry to get back late.

What I am trying to test here is that if the markers positions have any significance in relation to the position on the map. We could ignore the colour intensity for now.

The questions I am trying to address are

1) Does these two distributions have any significance to each other ?
2) Does the marker position (either begin or end position) correlate to the Begin or end position of the Map ?
3) Considering that since both the distributions come from a random distribution, how do I establish that there is significance i.e how do I do it Using KS-Test and are there any other statistical test that can be performed to find out any significance between these two data ?
4) Given that these data are in excel files, how do I perform the test in R ? (I lost touch with R for sometime)

I will be happy to even consider using just one column i.e say end position of Marker and end position of Map to get these statistics. Additionally I have the distance of the begin position of Marker to the end position of Map ( which is what is my concern), to establish if there is any significance in its distribution/ pattern and if there is a correlation between these two data.

I am unsure how to go about doing this statistically and was adviced by someone to try and do a KS-test. Could you give some light on this ?

ASKER CERTIFIED SOLUTION

Markus Fischer

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

haravallabhan

ASKER

Thank you very much. This is much clear. The data that was in the file is not the complete and real data, I made this up to get clarity in my analysis. Can I please ask you how you generated this graph i.e which software did you use , is this MATLAB?

SOLUTION

Markus Fischer

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

SOLUTION

yuk99

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial