Count occurrence of pattern in excel column

I'm looking for a way to look through a column and determine the occurrences of specific patterns. The file contains test answer keys with several columns, the important ones are:
question number, answer, test name
As the file contains a list of many tests, the results need to be based per test, sort of like the subtotal option for "at change in X column"

There are two main patterns we need to identify:
- How many times answer X occurs consequentially
- How many times each answer choice occurs in the test

I would think maybe highlighting any time a specific answer occurs 4 or more times in a row would be one good output. And at the end of each test, a row that lists the counts of each answer choice.

At the moment, the answer choices are stored as numbers (ordinal rather than letters) as text. I've got a simple nested if formula to translate the numbers to corresponding letters, I would plan on running that prior to running the pattern analysis.
musickmannData AnalystAsked:
Who is Participating?
skipper68Application Development ManagerCommented:
The reason the file got so large is because the code went rogue because of the test names.  There were over a million rows of summary data! :-)

I've attached a new version.  Run it as is and let me know what happens.  

It's more flexible now that it just searches for the next test name in the sheet and compares it to the previous one it had.  If it matches, it continues the count, if it doesn't, it assumes it's the last record and prints out the summary row.

Let me know how this works.
skipper68Application Development ManagerCommented:
The first one is pretty simple.  To count the instances of "Skipper" in column A, use this function

I'll try to get something about the second one together.
skipper68Application Development ManagerCommented:
If you don't have a lot of answer choices, you can use multiple CountIf statements like I did above.  That would be the quickest way.  Aside from writing a VB module or using a PowerPivot table, dynamically getting the counts might be difficult.

Column A          Column B
[Answer1]          =COUNTIF(A:A, $A1)
[Answer2]          =COUNTIF(A:A, $A2)
[Answer3]          =COUNTIF(A:A, $A3)
[Answer4]          =COUNTIF(A:A, $A4)
[Answer5]          =COUNTIF(A:A, $A5)
[Answer6]          =COUNTIF(A:A, $A6)
Cloud Class® Course: Amazon Web Services - Basic

Are you thinking about creating an Amazon Web Services account for your business? Not sure where to start? In this course you’ll get an overview of the history of AWS and take a tour of their user interface.

skipper68Application Development ManagerCommented:
Here's a VBA function I wrote that'll do what you need.  I included comments inline.

Sub CountUniqueInstances()
    Dim Col As New Collection
    Dim itm
    Dim i As Long
    Dim CellVal As Variant
    'Get the last row which has active data
    TheLastRow = ActiveSheet.UsedRange.SpecialCells(xlCellTypeLastCell).Row
    'Loop through the single column of each row adding unique values to the collection
    For i = 1 To TheLastRow
        CellVal = Sheets("Sheet1").Range("A" & i).Value
        If CellVal <> "" Then
            On Error Resume Next
            Col.Add CellVal, Chr(34) & CellVal & Chr(34)
            On Error GoTo 0
        End If
    Next i

    'Each value will be written on row 1 starting at column B
    ColumnIndex = 2 'if you want the column to be C, change this to 3, for D, change this to 4, etc.
    For Each itm In Col
        'count the occurrence of each unique value found
        iVal = Application.WorksheetFunction.CountIf(Range("A:A"), itm)
        Cells(1, ColumnIndex) = itm & ":" & iVal
        ColumnIndex = ColumnIndex + 1
End Sub

Open in new window

musickmannData AnalystAuthor Commented:
Thanks. The countif option is something I started with, and I thought would get me part of the way.

so all the answer choices are 1, 2, 3, or 4, which correspond to A, B, C, D

Since the file I have contains hundreds of tests with about 50 questions each, I'm looking for a way to do this en mass.

You would be a true savior if there was a way to create a new sheet that had a summary of some kind.

I'm attaching a file that contains a single test of dummy data, I added a summary sheet as an example of what would be ideal. The "Odd pattern" column would refer to something I could specify somewhere. I would essentially just make a list of what I would consider odd patterns since that is more complicated I know. But the counts would be the first step across the entire document.
musickmannData AnalystAuthor Commented:
I'm not sure I follow the script you have.

I changed line 12 to   Range("C" & i) since I have the data I need to count in column C.
I changed line 23 to ColumnIndex = 3 for the same reason
I cahnged line 26 to Range(C:C) again for the same reason

Maybe I shouldn't have adjusted those? The result I get in the first row doesn't seem to make sense.
This is Column C that I'm trying to count and what came up in the first row when running the script:

skipper68Application Development ManagerCommented:
I don't think the attachment got included.
musickmannData AnalystAuthor Commented:
Sorry, I guess I didn't click the Upload button after selecting the file. doh.
skipper68Application Development ManagerCommented:
Maybe try to set the CountIf statements on the Summary Sheet.  It'll automatically update based on the values.

If you add another sheet (i.e. Test), you just have to copy\paste a new row to the summary sheet and edit the sheet name in the 4 formulas.
musickmannData AnalystAuthor Commented:
That is a great simple solution for the counting of the occurrences. Right now I have about 300 tests or so in a single sheet, is there some way to use a script to run through the sheet, then automatically fill in the summary page, making an entry each time the test name column changes? I thought of the subtotal feature, but I don't know how to make that work counting the occurances of the 4 different answer options in the single column.

Do you have any thoughts on identifying patters? The most critical pattern would be to identify when there are 4 or more of the same answer consecutively.
skipper68Application Development ManagerCommented:
I'm sure a script could be written. Give me a little bit to see what I can come up with.  So just to be clear, there is only 1 sheet with all the tests.  Do you have a more accurate example I can play with?
musickmannData AnalystAuthor Commented:

Here is a more complete sample set.
skipper68Application Development ManagerCommented:
I'm glad I asked for an updated version because the PP column wasn't on the first version.

Here's the scoop.  I got a working version.  
1. Open the excel doc I've attached.  If you get an "Enable Macros" warning, click the "Enable Content" button.  Because there's code running behind it, it'll give you this warning unless you change the security settings.  
2. Next, Press "Alt" + F8 to open the macro window
3. Run the "CalculateTestResults" macro
4. Check out the results on the "Summary" tab.
5. Change one of the tests to have 4 concurrent values.  Rerun the macro
     The OddPatterns column will now show a "Y" and be highlighted.

Here are the instructions going forward:
1. Unless you're familiar with creating Excel VB Macros, you'll have to copy your data into this worksheet I provided.
2. Copy the data into the "Tests" sheet of the workbook.
     PLEASE NOTE, the code I wrote is dependent on the column ordering.  If you add, change, or remove columns, someone will have to change the code.
3. Press ALT + F8 to open the macro window.
4. Run the "CalculateTestResults" Macro
5. Check out the results on the summary tab.

musickmannData AnalystAuthor Commented:
So, I'm getting an error:
Run-time error '1004'
Application-defined or object-defined error

The debug option points to this line:
Sheets("Summary").Cells(TestNumber + 2, 1).Value = "Test " & TestNumber

The only difference in data from the sample file is the test name perhaps? The names can contain spaces, dashes, periods, and underscores. I think they are limited to 100 characters in length. Maybe that has something to do with it?

The export file has so much extra info, I tried removing the extraneous stuff so it didn't cause confusion, so the PP column is just points possible on the item.

While I can't write macros, I do have some limited programming skills, most of which is being able to read through and make small changes.

I'm going to try my hand at tracking down the error. I am running on a Mac, and also tried using a windows VM, so not sure if that has anything to do with this.
skipper68Application Development ManagerCommented:
Did the file work the first time you ran it before you made changes?  The problem is based on what you gave me, i assumed all the test names would be "test x".  Hence my request for a more real-life sample set.  :-)  Now that you said its not always the case, there is an issue.  I may be able to tweak the code to look at the previous test name and compare it to the current one to see if they are different.  Lemme take another look.
musickmannData AnalystAuthor Commented:
Yeah, I was afraid of that. I had to be careful given the source of the data is secure testing material, and since these posts can become publicly available, I was trying to keep things generic as possible.

The number of questions per test is not consistent either, so one test may have 50, one may have 40, etc. I think what I can do is manually replace the test names to match the text x expected, then reconnect the official names after running the analysis.
skipper68Application Development ManagerCommented:
The question count won't matter, the code will handle that. I'll see if I can make the test names more flexible. If it's a ton of work I'll let you know.  You can send me a clear example to my personal email if you want. It'll definitely make testing the new code easier.
musickmannData AnalystAuthor Commented:
Well, here is an exact data dump, I did a replacement of test names, so this is the full file I'm working with, same error.

This is the file, after running the script, so you can see the summary results in a full page of 0s and a large file.

Just for reference-
I copied the test names  to a new sheet, removed duplicates
In the next column, names each test consecutively
Did a lookup formula to attach the new test name to the original file, then replaced the original file test names with the new ones.
After the summary, I can easily do another lookup on the list to get the original names back
musickmannData AnalystAuthor Commented:
Absolutely awesome. You have saved me so much time!
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.