Simple (?) Correlation Matrix in SQL
Posted on 2011-09-28
I have responses to a quiz sitting in a table something like
Student_ID Question_ID Correct(Y/N)
I would like to correlate the likelihood of a question being answered correctly to all of the other questions and their likelihood of being answered correctly. My friend recently mumbled something about "correlation matrix" and web reading makes the pearson coefficient sound useful but I don't know much about statistics.
In the long run, I want to be able to ask a series of questions about any topic which are then graded either Correct or Incorrect (no middle ground) and then be able to make predictions about other questions in the set a person may be likely to get right or wrong (e.g. you correctly answered that bread is made of flour, so perhaps you also know that cakes are made of flour ... vice versa I do not know what water is, so its unlikely I know what ice is).
Rather than me making assumptions about topics and concepts that go together, I would like to be dynamically gathering data and understanding how students connect different concepts (and yes hoping to find lots of surprising connections!).
In a perfect world I would like to do this all in SQL Server itself without having to move data back and forth into statistical packages - ideally the model can keep building based on new inputs and adapt over time.