Solved

# Count occurrence of pattern in excel column

Posted on 2015-01-22
204 Views
Last Modified: 2015-01-25
I'm looking for a way to look through a column and determine the occurrences of specific patterns. The file contains test answer keys with several columns, the important ones are:
question number, answer, test name
As the file contains a list of many tests, the results need to be based per test, sort of like the subtotal option for "at change in X column"

There are two main patterns we need to identify:
- How many times answer X occurs consequentially
- How many times each answer choice occurs in the test

I would think maybe highlighting any time a specific answer occurs 4 or more times in a row would be one good output. And at the end of each test, a row that lists the counts of each answer choice.

At the moment, the answer choices are stored as numbers (ordinal rather than letters) as text. I've got a simple nested if formula to translate the numbers to corresponding letters, I would plan on running that prior to running the pattern analysis.
0
Question by:musickmann
[X]
###### Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

• Help others & share knowledge
• Earn cash & points
• Learn & ask questions
• 10
• 9
19 Comments

LVL 9

Expert Comment

ID: 40564896
The first one is pretty simple.  To count the instances of "Skipper" in column A, use this function
=COUNTIF(A:A,"Skipper")

I'll try to get something about the second one together.
0

LVL 9

Expert Comment

ID: 40565022
If you don't have a lot of answer choices, you can use multiple CountIf statements like I did above.  That would be the quickest way.  Aside from writing a VB module or using a PowerPivot table, dynamically getting the counts might be difficult.

Column A          Column B
[Answer1]          =COUNTIF(A:A, \$A1)
[Answer2]          =COUNTIF(A:A, \$A2)
[Answer3]          =COUNTIF(A:A, \$A3)
[Answer4]          =COUNTIF(A:A, \$A4)
[Answer5]          =COUNTIF(A:A, \$A5)
[Answer6]          =COUNTIF(A:A, \$A6)
0

LVL 9

Expert Comment

ID: 40565068
Here's a VBA function I wrote that'll do what you need.  I included comments inline.

``````Sub CountUniqueInstances()
Dim Col As New Collection
Dim itm
Dim i As Long
Dim CellVal As Variant

'Get the last row which has active data
TheLastRow = ActiveSheet.UsedRange.SpecialCells(xlCellTypeLastCell).Row

'Loop through the single column of each row adding unique values to the collection
For i = 1 To TheLastRow
CellVal = Sheets("Sheet1").Range("A" & i).Value

If CellVal <> "" Then
On Error Resume Next
Col.Add CellVal, Chr(34) & CellVal & Chr(34)
On Error GoTo 0
End If

Next i

'Each value will be written on row 1 starting at column B
ColumnIndex = 2 'if you want the column to be C, change this to 3, for D, change this to 4, etc.
For Each itm In Col
'count the occurrence of each unique value found
iVal = Application.WorksheetFunction.CountIf(Range("A:A"), itm)
Cells(1, ColumnIndex) = itm & ":" & iVal
ColumnIndex = ColumnIndex + 1
Next
End Sub
``````
0

LVL 2

Author Comment

ID: 40565183
Thanks. The countif option is something I started with, and I thought would get me part of the way.

so all the answer choices are 1, 2, 3, or 4, which correspond to A, B, C, D

Since the file I have contains hundreds of tests with about 50 questions each, I'm looking for a way to do this en mass.

You would be a true savior if there was a way to create a new sheet that had a summary of some kind.

I'm attaching a file that contains a single test of dummy data, I added a summary sheet as an example of what would be ideal. The "Odd pattern" column would refer to something I could specify somewhere. I would essentially just make a list of what I would consider odd patterns since that is more complicated I know. But the counts would be the first step across the entire document.
0

LVL 2

Author Comment

ID: 40565202
I'm not sure I follow the script you have.

I changed line 12 to   Range("C" & i) since I have the data I need to count in column C.
I changed line 23 to ColumnIndex = 3 for the same reason
I cahnged line 26 to Range(C:C) again for the same reason

Maybe I shouldn't have adjusted those? The result I get in the first row doesn't seem to make sense.
This is Column C that I'm trying to count and what came up in the first row when running the script:

correct_answer_ordinal:1
correct_answer_ordinal
3
3
4
3
3
4
0

LVL 9

Expert Comment

ID: 40565206
I don't think the attachment got included.
0

LVL 2

Author Comment

ID: 40565219
Sorry, I guess I didn't click the Upload button after selecting the file. doh.
AnswerKeyProject.xlsx
0

LVL 9

Expert Comment

ID: 40565318
Maybe try to set the CountIf statements on the Summary Sheet.  It'll automatically update based on the values.

If you add another sheet (i.e. Test), you just have to copy\paste a new row to the summary sheet and edit the sheet name in the 4 formulas.
AnswerKeyProject.xlsx
0

LVL 2

Author Comment

ID: 40567703
That is a great simple solution for the counting of the occurrences. Right now I have about 300 tests or so in a single sheet, is there some way to use a script to run through the sheet, then automatically fill in the summary page, making an entry each time the test name column changes? I thought of the subtotal feature, but I don't know how to make that work counting the occurances of the 4 different answer options in the single column.

Do you have any thoughts on identifying patters? The most critical pattern would be to identify when there are 4 or more of the same answer consecutively.
0

LVL 9

Expert Comment

ID: 40567740
I'm sure a script could be written. Give me a little bit to see what I can come up with.  So just to be clear, there is only 1 sheet with all the tests.  Do you have a more accurate example I can play with?
0

LVL 2

Author Comment

ID: 40567747
Thanks!

Here is a more complete sample set.
AnswerKeys.xlsx
0

LVL 9

Expert Comment

ID: 40567828
I'm glad I asked for an updated version because the PP column wasn't on the first version.

Here's the scoop.  I got a working version.
1. Open the excel doc I've attached.  If you get an "Enable Macros" warning, click the "Enable Content" button.  Because there's code running behind it, it'll give you this warning unless you change the security settings.
2. Next, Press "Alt" + F8 to open the macro window
3. Run the "CalculateTestResults" macro
4. Check out the results on the "Summary" tab.
5. Change one of the tests to have 4 concurrent values.  Rerun the macro
The OddPatterns column will now show a "Y" and be highlighted.

Here are the instructions going forward:
1. Unless you're familiar with creating Excel VB Macros, you'll have to copy your data into this worksheet I provided.
2. Copy the data into the "Tests" sheet of the workbook.
PLEASE NOTE, the code I wrote is dependent on the column ordering.  If you add, change, or remove columns, someone will have to change the code.
3. Press ALT + F8 to open the macro window.
4. Run the "CalculateTestResults" Macro
5. Check out the results on the summary tab.

Enjoy!
AnswerKeys.xlsm
0

LVL 2

Author Comment

ID: 40568850
So, I'm getting an error:
Run-time error '1004'
Application-defined or object-defined error

The debug option points to this line:
Sheets("Summary").Cells(TestNumber + 2, 1).Value = "Test " & TestNumber

The only difference in data from the sample file is the test name perhaps? The names can contain spaces, dashes, periods, and underscores. I think they are limited to 100 characters in length. Maybe that has something to do with it?

The export file has so much extra info, I tried removing the extraneous stuff so it didn't cause confusion, so the PP column is just points possible on the item.

While I can't write macros, I do have some limited programming skills, most of which is being able to read through and make small changes.

I'm going to try my hand at tracking down the error. I am running on a Mac, and also tried using a windows VM, so not sure if that has anything to do with this.
0

LVL 9

Expert Comment

ID: 40568854
Did the file work the first time you ran it before you made changes?  The problem is based on what you gave me, i assumed all the test names would be "test x".  Hence my request for a more real-life sample set.  :-)  Now that you said its not always the case, there is an issue.  I may be able to tweak the code to look at the previous test name and compare it to the current one to see if they are different.  Lemme take another look.
0

LVL 2

Author Comment

ID: 40568860
Yeah, I was afraid of that. I had to be careful given the source of the data is secure testing material, and since these posts can become publicly available, I was trying to keep things generic as possible.

The number of questions per test is not consistent either, so one test may have 50, one may have 40, etc. I think what I can do is manually replace the test names to match the text x expected, then reconnect the official names after running the analysis.
0

LVL 9

Expert Comment

ID: 40568866
The question count won't matter, the code will handle that. I'll see if I can make the test names more flexible. If it's a ton of work I'll let you know.  You can send me a clear example to my personal email if you want. It'll definitely make testing the new code easier.  Feedback@DRECustom.com
0

LVL 2

Author Comment

ID: 40568867
Well, here is an exact data dump, I did a replacement of test names, so this is the full file I'm working with, same error.

This is the file, after running the script, so you can see the summary results in a full page of 0s and a large file.

Just for reference-
I copied the test names  to a new sheet, removed duplicates
In the next column, names each test consecutively
Did a lookup formula to attach the new test name to the original file, then replaced the original file test names with the new ones.
After the summary, I can easily do another lookup on the list to get the original names back
AnswerKeys-1.xlsm
0

LVL 9

Accepted Solution

skipper68 earned 500 total points
ID: 40568894
The reason the file got so large is because the code went rogue because of the test names.  There were over a million rows of summary data! :-)

I've attached a new version.  Run it as is and let me know what happens.

It's more flexible now that it just searches for the next test name in the sheet and compares it to the previous one it had.  If it matches, it continues the count, if it doesn't, it assumes it's the last record and prints out the summary row.

Let me know how this works.
AnswerKeys-New.xlsm
0

LVL 2

Author Closing Comment

ID: 40569203
Absolutely awesome. You have saved me so much time!
0

## Featured Post

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Deploying a Microsoft Access application in a Citrix environment is not difficult but takes a few steps. However, Citrix system people are often of little help, as they typically know next to nothing about Access. The script provided here will take â€¦
With User Account Control (UAC) enabled in Windows 7, one needs to open an elevated Command Prompt in order to run scripts under administrative privileges. Although the elevated Command Prompt accomplishes the task, the question How to run as scriptâ€¦
This Micro Tutorial will demonstrate how to use a scrolling table in Microsoft Excel using the INDEX function.
Finds all prime numbers in a range requested and places them in a public primes() array. I've demostrated a template size of 30 (2 * 3 * 5) but larger templates can be built such 210  (2 * 3 * 5 * 7) or 2310  (2 * 3 * 5 * 7 * 11). The larger templaâ€¦
###### Suggested Courses
Course of the Month5 days, 11 hours left to enroll

#### 734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.