Need help with pattern analysis

Experts:

I need some assistance with pattern analysis (preferably in MS-Excel).

General Background:
- I've released an electronic survey and anticipate several thousand responses.
- The survey contains a group of questions (i.e., 24) for which a 7-point Likert scale is used.  
- I anticipate that more than 95% of those collected surveys will be valid.   That is, users have identified their actual opinions.
- However, there's a slight chance that some users arbitrarily selected values.   These would be considered "careless" responses and should be deleted before conducting statistical analysis.

Problem Background (patterns):
- In the event that careless responses will exist, some may follow obvious patterns.   These could be in the following form:
-- All 24 answers have the same Likert scale values (e.g., "only 1s" or "only 6s")
-- All 24 answers are either in repeated ASC or DESC order (e.g., "1, 2, 3, ..., 7, 1, 2, 3, ..., 7,..." or "7, 6, 5, ..., 1, 7, 6, 5, ...).

Assuming these patterns do exist, they could be easily identified in a database (e.g., MS-Access) and then deleted from the dataset.

However, there may be other patterns (e.g., several 3s, followed by several 4, followed by several 7s, etc. etc.) that could not be easily predicted/identified through automation.

This is where I need some help... please see attached XLS that illustrates some examples of possible responses patterns.

For those -- with a red comment -- is there a way that some VBA script might detect a suspicious response?   This is not to say it really is... however, given that several thousands survey would have to be reviewed, any automation would certain help to streamline this pre-analysis process.

Thoughts/recommendations for a VBA routine?

Thanks,
EEH
Pattern-Analysis.xls
ExpExchHelpAnalystAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

TommySzalapskiCommented:
I would imagine if someone was being careless but didn't want it to be obvious, that a simple pattern check wouldn't work well at all.

I also would think 95% valid may be overly optimistic depending on who you are surveying.

Maybe it's too late, but a better way to detect careless responses is to inject questions that should have correlated responses.
Like
question 1: Rate how much you liked product A
...
question 23: Rate how much you disliked product A

If both answers get 6 or 7, then it's pretty obvious the person isn't paying attention.
0
d-glitchCommented:
Another item for the Maybe it's too late category;

You could also measure the amount of time spent on each survey and reject the people who fill it in too fast.
0
aikimarkCommented:
Tommy is correct.  You should structure your surveys to ask the same thing differently, looking for consistency.

As far as pattern recognition is concerned, you can look for patterns, but you shouldn't try to look for patterns of length less than four.  We can use an algorithm similar to that in plagiarism software...we hash sequences of strings (words/numbers) and compare the hash values.

The most likely data structures to use will be a dictionary object or an integer array.

=============
In the following code example, I use a dictionary object to store the counts of the unique sequences.  I have two threshold sets (sequences of four values, repeated more than two times or sequences of five values, repeated more than once).

Option Explicit

Public Sub HighlightSuspicious()
    Dim rng As Range
    Dim rngRow As Range
    Dim vRowData As Variant
    Dim oDicUnique As Object
    Dim lngRow As Long, lngCol As Long, lngColOffset As Long
    Dim strPattern As String
    Dim vItem As Variant
    Dim vGroupings As Variant
    Dim vGroup As Variant
    vGroupings = Array(Array(4, 2), Array(5, 1))
    Set oDicUnique = CreateObject("scripting.dictionary")
    Set rng = ActiveSheet.Range(ActiveSheet.Cells(2, 1), ActiveSheet.Cells(ActiveSheet.Cells.SpecialCells(xlCellTypeLastCell).Row, 24))
    vRowData = rng.Value
    For Each vGroup In vGroupings
        For lngRow = LBound(vRowData, 1) To UBound(vRowData, 1)
            For lngCol = LBound(vRowData, 2) To UBound(vRowData, 2) - vGroup(0)
                strPattern = vbNullString
                lngColOffset = 0
                Do
                    strPattern = strPattern & vRowData(lngRow, lngCol + lngColOffset)
                    lngColOffset = lngColOffset + 1
                Loop Until Len(strPattern) = vGroup(0)
                If oDicUnique.exists(strPattern) Then
                    oDicUnique(strPattern) = oDicUnique(strPattern) + 1
                Else
                    oDicUnique.Add strPattern, 1
                End If
            Next
            For Each vItem In oDicUnique
                If oDicUnique(vItem) > vGroup(1) Then
                    rng.Rows(lngRow).Interior.Color = vbYellow   'highlight the cells
                    Debug.Print "Found " & oDicUnique(vItem) & " : " & vItem & vbTab & " in row " & lngRow + rng.Row - 1
                    Exit For
                End If
            Next
            oDicUnique.RemoveAll
        Next
    Next
End Sub

Open in new window

Note: When a grouping pattern criterion is found, the iteration for that row stops.  Only the first duplicate pattern is listed in the Immediate window.

When I ran the code against your workbook, it gave the following results in the Immediate window.  Please note that row 2 has a repeated pattern in both length 4 and length 5 passes.
Found 3 : 1171   in row 2
Found 20 : 1111  in row 3
Found 20 : 6666  in row 5
Found 3 : 1234   in row 6
Found 3 : 7654   in row 8
Found 3 : 3333   in row 9
Found 5 : 2211   in row 10
Found 5 : 2211   in row 11
Found 2 : 11171  in row 2
Found 19 : 11111     in row 3
Found 19 : 66666     in row 5
Found 3 : 12345  in row 6
Found 3 : 76543  in row 8
Found 2 : 11111  in row 9
Found 5 : 22112  in row 10
Found 5 : 22112  in row 11
Found 2 : 12345  in row 12

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Rowby Goren Makes an Impact on Screen and Online

Learn about longtime user Rowby Goren and his great contributions to the site. We explore his method for posing questions that are likely to yield a solution, and take a look at how his career transformed from a Hollywood writer to a website entrepreneur.

ExpExchHelpAnalystAuthor Commented:
@aikimark:

Wow... another brilliant solution of yours.

I plugged in the VBA into a module.    All of the suspicious rows were highlighted in yellow.   Excellent!

What I didn't see was the "Immediate window" that outputted the pattern such as...
Found 3 : 1171   in row 2
Found 20 : 1111  in row 3
Found 20 : 6666  in row 5
....
....

In addition to having the yellow rows, outputting the pattern into another tab -- or window from which it can be copied -- would be excellent!

Thanks,
EEH
0
ExpExchHelpAnalystAuthor Commented:
@d-glitch:

Excellent point... and it's actually not too late.   The survey tool that I'm using gives me both the individual's completion time as well as the average completion time of all participating users.  

So, yes, I'll make sure I check the completion times... anything less than e.g., 10 minutes would be marked as "suspicious" as well.

Thanks,
EEH
0
aikimarkCommented:
press Ctrl+G to see the Immediate window or use the View menu.
0
aikimarkCommented:
I wouldn't worry about copying the duplicate rows until we know that we can find the suspect rows without too many false positives.
0
Ejgil HedegaardCommented:
You have no way of detecting that responses are careless, or just misunderstood, using patterns.
As TommySzalapski says, careless will probably be random answers, misunderstood can be anything. For both, some of the answers are valid and others not.
This can only be detected using correlated questions, and/or grouped questions where answers in the group must correlate.
Grouped questions could be products of the same type, expected to have more or less similar answers. Preferable some in the group asked positive (like), and others negative (dislike).

Then you can make a list of logical errors, like q1>4 and q23>4, and eliminate records with too many logical errors, so you have to define how many errors are acceptable, and that depends of how close the correlation and/or grouping can be made.
0
ExpExchHelpAnalystAuthor Commented:
aikimark:

When clicking CTRL+G, the GoTo window pops up (I'm using Excel 2003).   It's empty though... check out attached JPG.

Does it work differently, maybe, in Excel 2007/2010?

EEH
IntermediateWindow.jpg
0
ExpExchHelpAnalystAuthor Commented:
@hgholt:

Agreed... just because a record will be highlighted doesn't mean it'll be deleted.

However, applying the proposed methodology will most definitely streamline the review/analysis of potential violators (vs. manually looking at e.g., 2000 records).

That combined with d-glitch's recommendation to review the survey completion time will enable me to make a very informed decision.   Average survey completion time is anticipated at 25 minutes.    Thus, if a record is highlighted in yellow and survey completion time was only 5 minutes, it'll be a "goner".   ;)

Thanks for your valid point though.

EEH
0
aikimarkCommented:
Ctrl+G when you are in the code window
0
ExpExchHelpAnalystAuthor Commented:
Sorry, my bad.

When putting the cursor into the VBA (module), nothing happens when clicking Ctrl+G.

I also tried to run the first run the code from this view followed by Ctrl-G.   Still, nothing.

What am I missing?

EEH
0
aikimarkCommented:
try the View menu.  It is possible that your Immediate window is behind some other window/pane or that it is of such a small size (height) that you can't see its contents.
0
ExpExchHelpAnalystAuthor Commented:
Great... that works!

Thank you again for providing a fantastic solution!!

EEH
0
ExpExchHelpAnalystAuthor Commented:
Most excellent solution!!
0
ExpExchHelpAnalystAuthor Commented:
@aikimark:

Quick follow-up... after running the macro, is there a way to flag another column/cell that indicates a suspicious pattern was found.

Here's the process:
1. Columns A:X have white cell background color
2. Run the macro
3. Rows where pattern was found is now highlighted in yellow (e.g., row 15 and row 23 out of 40 rows)

Then:
a. Based on the macro, use a formula such as "=If(A1 = yellow, "Suspicious","Ok)
b. Apply the formula to, e.g., AA1:AA40
c. In this case, AA15 and AA23 should display "Suspicious"

Then:
- Before adding another dataset, I apply white cell background color to columns A:X
- At this time, however, the two cells (AA15 and AA23) still display "Suspicious" vs. "Ok".

My question, how can such function be automatically be refreshed?  

I've posted a new question on this (new) topic at:
http://www.experts-exchange.com/Software/Office_Productivity/Office_Suites/MS_Office/Excel/Q_28283707.html

In that question, however, I didn't include all the details.

Any thoughts/recommendations,
EEH
0
aikimarkCommented:
you might be able to do this with Excel 2007/2010, but I haven't tried it.  It would be easier to do in the code.  I'll look at your new question.
0
ExpExchHelpAnalystAuthor Commented:
@aikimark:

May I ask a follow-up question?

First of the, the pattern analysis function works great.   At this time, I'm performing some data analysis on a recently received data set which now contains about 900 survey records.

When executing the function, it highlights all suspicious records.   However, as part of the execution, I'm getting a run-time error.   See attached JPG.   When clicking on Debug, the line "strPattern = strPattern & vRowData(lngRow, lngCol + lngColOffset)" is highlighted.

Again, the programs still executes fine but I'm not sure as to why I get the run-time error?
I never saw this when testing it on approximately 20 records (now we get ~900).

Thanks,
EEH
RunTimeError.jpg
0
aikimarkCommented:
What are the variable values?
strPattern
vRowData(lngRow, lngCol + lngColOffset)
lngRow
lngCol
lngColOffset

What is the data on that row?
0
ExpExchHelpAnalystAuthor Commented:
Thanks for the reply... I appreciate it.

The variable values are based on a 7-point Likert scale (+ value 8 for "N/A").   So, it only has check repeated patterns between 1 to 8.

In respect to "which row"... I checked the intermediate window.   It refers to row 876 (out of 886).  

That row is highlighted and is the last one that has a suspicious pattern.   So, again, the function fully executes and finds all suspicious values in the current data set.   I'm just not sure as to why it shows the run-time error.

Does that help?

EEH
0
aikimarkCommented:
in the immediate window, type a question mark followed by a variable name or expression.
0
ExpExchHelpAnalystAuthor Commented:
@aikimark:

Ok... I've done that but I'm not sure how it might help... please see attached JPG.

What am I missing?

EEH
suspiciousPattern.jpg
0
aikimarkCommented:
I need to know each of the variables' values.  Please do this:
?strPattern, lngRow, lngCol, lngColOffset
?vRowData(lngRow, lngCol + lngColOffset)
0
ExpExchHelpAnalystAuthor Commented:
aikimark:

First line (?strPattern....) produced an input.  

Second line (?vRowData) resulted in the run-time error.

Please see attached JPG.

EEH
SuspiciousPattern.jpg
0
aikimarkCommented:
please post the workbook/worksheet with the data being validated
0
ExpExchHelpAnalystAuthor Commented:
aikimark:

Any chance I can email the XLS?  

EEH
0
aikimarkCommented:
Unless the data is more sensitive than that which you've already posted, EE prefers that all communication be public.
0
ExpExchHelpAnalystAuthor Commented:
The original data was "dummy" data.  Current data is actual data.    I do need to adhere to protecting the information.

If that's not doable, I'll have to live w/ the run-time error.

EEH
0
aikimarkCommented:
you can send it to ****@****.***
0
ExpExchHelpAnalystAuthor Commented:
aikimark -- thank you... I just emailed it to you.

EEH
0
ExpExchHelpAnalystAuthor Commented:
@aikimark:

Did the email w/ attachment make it through?

Thx,
EEH
0
aikimarkCommented:
yes.  I received it.  Looking at it now
0
ExpExchHelpAnalystAuthor Commented:
Thanks, aikimark.   I appreciate your help!!   ;)

EEH
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft Excel

From novice to tech pro — start learning today.