Solved

Correlating Exam Answers to each other

Posted on 2011-09-29
5
412 Views
Last Modified: 2012-05-12
Hello Mathematics Geniuses,

I would like to know a way to construct a dynamic correlation matrix (ideally in Sql Server) so that I can compare each exam question (the likelihood of it being answered correctly) in relation to every other question.

As responses will be stored in a SQL Server table, it would be great if somehow the correlation can be done in SQL Server too, so as more people complete the exam the dataset on which the correlations are based can grow and become more reliable.

The starting data set will be very simple, along the lines of

Student_ID    Question_ID   Correct(Y/N)   Timestamp

I am looking to find connections between pairs or larger combinations of questions, based only on the students ability to answer them correctly, thus avoiding superimposing assumptions about the relationships between different topics. The end result being that questions in exams conducted online can be ordered (dynamically) from easiest to hardest, based on responses made so far. It may also be used to skip questions that are likely to be correct and thus focus on other areas of greater challenge.

Apologies if that does not sound terribly clear and thanks for your time!

0
Comment
Question by:Split_Pin
  • 2
  • 2
5 Comments
 
LVL 38

Expert Comment

by:Aaron Tomosky
ID: 36853122
Does each student only take the test once? If not you need to add a testId to keep each test together.

Getting stats on correct vs incorrect to judge difficulty is easy, correlating across questions is the hard part. I'll think on it more...
0
 

Author Comment

by:Split_Pin
ID: 36890403
Thats a valuable insight aarontomosky - I was thinking of using the minimum timestamp for each Student/Question combination to separate the tests - particularly if used for test-revise-retest where hopefully the student will move towards 100% correct. I am guessing the correlation I really want to know is based on the first time attempt only.

Maybe my use of the words "likelihood of it being answered correctly" is not even right - but in plain English, we could guess that if Mr Split_Pin knows what Cheese is, but does not know what Correlation is, then its "likely" he knows what Milk is, and accordingly unlikely he knows what the Pearson Coefficient is  (And perhaps there is no correlation at all between knowledge of dairy products and knowledge of stats).

The trick is, I don't want to invent any categories - rather I want to pour a whole bunch of questions into my database and have the students show me (through the stats) which ideas actually associate (postively or negatively).

Cheers, Split_Pin
0
 
LVL 38

Assisted Solution

by:Aaron Tomosky
Aaron Tomosky earned 250 total points
ID: 36891910
I can roughly think of a way to do this, but it may be rough to keep it all in tsql code.
Start with q1, sum all the other correct answers here q1 is true. For the ones over x% correct we add "q1" to their column in another table.

Do this for each question and you will have a list of associated questions, maybe even giving extra weight somehow if pairs are related both ways.

Just to look at all the data in a square matrix, you could put .79q1 or something to show that 79% of people that got q1 right got this right.

Just an outline for now but The more I think on it the more it seems doable
0
 
LVL 37

Accepted Solution

by:
TommySzalapski earned 250 total points
ID: 36892402
The right way to do this (from a statistician point of view) would be to use logistic regression. Model a wrong answer as 0 and a correct answer as 1 and choose from any of the thousands of regression packages out there.
You could also use linear regression, but logistic is better when the dependent variable is binary (two options).

If you've never taken any stat courses and are worried at the amount of math involved (which isn't really that much) then you could just look at the percent that got the two right compared to the average amount that got both of any pair of quesions right.

One problem that you will have in either case is the age-old causation vs. correlation problems.
x and y are correlated so did x cause y or did y cause x or did some other factor cause both?
0
 

Author Comment

by:Split_Pin
ID: 36895516
Greetings, hope you are both enjoying the weekend.
If we just focus for a moment on

just look at the percent that got the two right compared to the average amount that got both of any pair of questions right.

This is really sounding like something SQL could be good at [and yes the maths worries me :-) ]

If we join every response every other response by the same student (like a self join on a.Question_ID > b.Question_ID AND a.Student_ID = b.Student_ID) we would be mapping every pair (taking that 2 and 3 is same as 3 and 2, and we dont need to map it both ways). If we gave the pairs a result maybe based on 1 * 1 = 1 (both correct) we can then work out the % correct for any random pair and then seek combinations that exceed the average...

I guess by joining again to another version of the same table we could work out triplicates (all 3 out of 3 correct) too, and so on.

I think this method would hold up as well in cases where students have not all answered every question, say there are 100 questions and Frank answers 50, Jean answers 30 (some of which are the same as Frank and some of which are different) we can still cross map all the known pairs.

Please do let me know if I sound like I have completely lost the point here!!
Really appreciate your input. It sounds like aarontomovsky's idea could help to organise the end result into a table than can be used to select the next question to be asked.

Well I had better start prototyping - will split the points as you've both got me thinking along the right lines.
Cheers, Split_Pin

0

Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
Info needed on Algorithmic trading 1 83
Statistics & Data Sceicne 2 81
homophone word for ***archeries*** 5 83
Data array of weights 2 58
Foreword (May 2015) This web page has appeared at Google.  It's definitely worth considering! https://www.google.com/about/careers/students/guide-to-technical-development.html How to Know You are Making a Difference at EE In August, 2013, one …
Article by: Nicole
This is a research brief on the potential colonization of humans on Mars.
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…
This video shows how to remove a single email address from the Outlook 2010 Auto Suggestion memory. NOTE: For Outlook 2016 and 2013 perform the exact same steps. Open a new email: Click the New email button in Outlook. Start typing the address: …

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now