Link to home
Start Free TrialLog in
Avatar of northward
northward

asked on

Data Mining and Analysing Data

I teach in a school and I have many students taking different combinations of subjects over a few years. I would like to enquire what is the easiest way to do data mining of the marks/grades.

For eg, A, B and C are students and X, Y and Z are subjects.

A takes X and Y in 2011 and 2012
B takes Y and Z in 2011 and 2012
C takes X and Z in 2011 and 2012

My Data is stored as

A, X, 2011, m1
A, Y, 2011, m2
B, Y, 2011, m3
B, Z, 2011, m4
C, X, 2011, m5
C, Z, 2011, m6
A, X, 2012, m7
A, Y, 2012, m8
B, Y, 2012, m9
B, Z, 2012, m10
C, X, 2012, m11
C, Z, 2012, m12

Now I have a lot more students and subjects ... taken over a few years.

I want to systematically go through all the combinations .... say I have n subjects, then choose two of the subject eg X in 2011 and Y in 2012, or X in 2011 and X in 2012, check if I have more than m points (maybe m = 10), do a linear regression and return the R-square value and also the gradient and intercept of the linear regression.

The task may be interrupted and continued on another day.

If it takes too long on one machine, I may decide to run it on a cloud.

What would you suggest I do?

Thanks.
Avatar of northward
northward

ASKER

By the way, I would prefer to use open source if possible.
SOLUTION
Avatar of vasto
vasto
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thank you.  :)