Link to home
Start Free TrialLog in
Avatar of haravallabhan
haravallabhanFlag for India

asked on

Using R or Matlab for Statistical analysis

Hi,

 I have two excel files each with a column of data in csv format (pls see attached files). I would like to do a Kolmogrov smirnov test, with a graphical output that can be saved and a Pearson correlation using these two datasets.
Can someone help me how I should go about doing it in R. I read the commands in R for KS test as

ks.test(x, y, ..., alternative = c("two.sided", "less", "greater"), exact = NULL)

But not sure how to input my data, and what two.sided, less, greater mean, ideally I would like R to accept the two files seperately by asking for each file name from me and then analysing and giving an output that can be imported on to excel and graphically visualised.
I am also happy to use Matlab, given that I am somewhat familiar with R.

Any help would be greately appreciated.

Thanks.
MAP-R.xls
MARKER-R.xls
Avatar of msheskey
msheskey
Flag of United States of America image

is along the lines of what you are looking for?  it is written in MATLAB.  the filenames are hardcoded but that can be changed, I followed the examples for reading excel files (xlsread) and kstest2 function.
%get the number and text data from the marker spreadsheet
[markerNum, markerTxt] = xlsread('C:\Users\Matthew Sheskey\Desktop\MARKERID.xls','Sheet1');

%get the number and text data from the map spreadsheet
[mapNum, mapTxt] = xlsread('C:\Users\Matthew Sheskey\Desktop\MAPID.xls','Sheet1');

[h,p,k] = kstest2(markerNum,mapNum);

F1 = cdfplot(markerNum);
hold on
F2 = cdfplot(mapNum);
set(F1,'LineWidth',2,'Color','r')
set(F2,'LineWidth',2)
legend([F1 F2],'F1(markerNum)','F2(mapNum)','Location','NW')

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of msheskey
msheskey
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of haravallabhan

ASKER

Thanks the KS test is working, also could you please tell me what
[mapNum, mapTxt] mapNum and mapTXT mean are they the columns in the sheet ?
also what does
[h,p,k] mean ?
This is the first time I am using Matlab and pls so bear with my basic questions.

Can you suggest any other type of correlation that I can employ on this i.e for this uneven data, other than Pearson.

Thank you
do you want this script to generate a new excel file to avoid matlab all together?
on the MATLAB help page type in xlsread, this will guide you through how the function works, MATLAB has the ability to return multiple matrices from one function, the names in the brackets are the names of new variables that will be created from the file, the syntax for returning two variables from the xlsread function will return the data points that is numeric in nature in the first variable, the second variable will be the text data, you can see this if you double click on the variables on the upper right box that says workspace, this box is a list of all current variables, if a function does not return multiple values or you are not asking it to return multiple values the brackets on the left are not necessary, xlsread also gives you the functionality to return the raw data, the companion function to xlsread is xlswrite
You will have to refer to MATLAB's documentation for the meaning of h, p, and k.  They will be the best reference.  Their documentation for this function has several citations.
to deal with the unequal dataset sizes you need to bootstrap the data, MATLAB can do this, type bootstrap into the help dialog box and select the second link that shows up, I can help you code this but the math part will need to come from you, I have a degree in statistics so I know my way around but the ultimate decision on the math needs to come from you
there is an r library called boot that will help you do this, I have attached a snippet of my professors notes from college on the topic
 
library(boot)
# First, create a function to obtain Spearman correlations from the data:
mycorr <- function(data, indices) {
d <- data[indices,]
return(cor(d$age, d$rate, method="spearman"))
}
# Next, run the bootstrapping using 1000 replications:
results <- boot(data=respiratory, statistic=mycorr, R=1000)
# Finally, view results of bootstrapping:
results
Here’s the output for the code above:
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = respiratory, statistic = mycorr, R = 1000)
Bootstrap Statistics :
original bias std. error
t1* -0.7445018 0.001416301 0.01725391
do you want this script to generate a new excel file to avoid matlab all together?

I will be better with a R script for this as I can integrate it with some of my perl code, but its still okay this way.

Also another thing is in Matlab I dont see any of the statistical output i.e I dont see the p value, D value etc  to interpret it much better, I only see the graphs for the same.  

Thanks for the elaborate comment and help I shall check on the documentation as you mentioned and also see if or how I shld do the bootstrapping. I am so new to this stats thing, just knew only basics so this is beneficial to know what all I could do with my data.

Since you are an stats expert could you also please have a look at this question I posted sometime back, any advice will be highly helpful.
https://www.experts-exchange.com/questions/26305568/Performing-Statistical-analysis-kolmogorov-smirnov-test-related-using-R-or-Matlab-or-Excel.html

You deserve more than 500 points if only there is a way to award it.

Thank you.
you can call perl scripts from matlab as well, just search perl in the help page, but you should stick with the language you know the best