Link to home
Start Free TrialLog in
Avatar of Split_Pin
Split_PinFlag for Australia

asked on

Correlating Exam Answers to each other

Hello Mathematics Geniuses,

I would like to know a way to construct a dynamic correlation matrix (ideally in Sql Server) so that I can compare each exam question (the likelihood of it being answered correctly) in relation to every other question.

As responses will be stored in a SQL Server table, it would be great if somehow the correlation can be done in SQL Server too, so as more people complete the exam the dataset on which the correlations are based can grow and become more reliable.

The starting data set will be very simple, along the lines of

Student_ID    Question_ID   Correct(Y/N)   Timestamp

I am looking to find connections between pairs or larger combinations of questions, based only on the students ability to answer them correctly, thus avoiding superimposing assumptions about the relationships between different topics. The end result being that questions in exams conducted online can be ordered (dynamically) from easiest to hardest, based on responses made so far. It may also be used to skip questions that are likely to be correct and thus focus on other areas of greater challenge.

Apologies if that does not sound terribly clear and thanks for your time!

Avatar of Aaron Tomosky
Aaron Tomosky
Flag of United States of America image

Does each student only take the test once? If not you need to add a testId to keep each test together.

Getting stats on correct vs incorrect to judge difficulty is easy, correlating across questions is the hard part. I'll think on it more...
Avatar of Split_Pin

ASKER

Thats a valuable insight aarontomosky - I was thinking of using the minimum timestamp for each Student/Question combination to separate the tests - particularly if used for test-revise-retest where hopefully the student will move towards 100% correct. I am guessing the correlation I really want to know is based on the first time attempt only.

Maybe my use of the words "likelihood of it being answered correctly" is not even right - but in plain English, we could guess that if Mr Split_Pin knows what Cheese is, but does not know what Correlation is, then its "likely" he knows what Milk is, and accordingly unlikely he knows what the Pearson Coefficient is  (And perhaps there is no correlation at all between knowledge of dairy products and knowledge of stats).

The trick is, I don't want to invent any categories - rather I want to pour a whole bunch of questions into my database and have the students show me (through the stats) which ideas actually associate (postively or negatively).

Cheers, Split_Pin
SOLUTION
Avatar of Aaron Tomosky
Aaron Tomosky
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Greetings, hope you are both enjoying the weekend.
If we just focus for a moment on

just look at the percent that got the two right compared to the average amount that got both of any pair of questions right.

This is really sounding like something SQL could be good at [and yes the maths worries me :-) ]

If we join every response every other response by the same student (like a self join on a.Question_ID > b.Question_ID AND a.Student_ID = b.Student_ID) we would be mapping every pair (taking that 2 and 3 is same as 3 and 2, and we dont need to map it both ways). If we gave the pairs a result maybe based on 1 * 1 = 1 (both correct) we can then work out the % correct for any random pair and then seek combinations that exceed the average...

I guess by joining again to another version of the same table we could work out triplicates (all 3 out of 3 correct) too, and so on.

I think this method would hold up as well in cases where students have not all answered every question, say there are 100 questions and Frank answers 50, Jean answers 30 (some of which are the same as Frank and some of which are different) we can still cross map all the known pairs.

Please do let me know if I sound like I have completely lost the point here!!
Really appreciate your input. It sounds like aarontomovsky's idea could help to organise the end result into a table than can be used to select the next question to be asked.

Well I had better start prototyping - will split the points as you've both got me thinking along the right lines.
Cheers, Split_Pin