Solved

Count occurrence of pattern in excel column

Posted on 2015-01-22
19
171 Views
Last Modified: 2015-01-25
I'm looking for a way to look through a column and determine the occurrences of specific patterns. The file contains test answer keys with several columns, the important ones are:
question number, answer, test name
As the file contains a list of many tests, the results need to be based per test, sort of like the subtotal option for "at change in X column"

There are two main patterns we need to identify:
- How many times answer X occurs consequentially
- How many times each answer choice occurs in the test

I would think maybe highlighting any time a specific answer occurs 4 or more times in a row would be one good output. And at the end of each test, a row that lists the counts of each answer choice.

At the moment, the answer choices are stored as numbers (ordinal rather than letters) as text. I've got a simple nested if formula to translate the numbers to corresponding letters, I would plan on running that prior to running the pattern analysis.
0
Comment
Question by:musickmann
  • 10
  • 9
19 Comments
 
LVL 9

Expert Comment

by:skipper68
ID: 40564896
The first one is pretty simple.  To count the instances of "Skipper" in column A, use this function
=COUNTIF(A:A,"Skipper")

I'll try to get something about the second one together.
0
 
LVL 9

Expert Comment

by:skipper68
ID: 40565022
If you don't have a lot of answer choices, you can use multiple CountIf statements like I did above.  That would be the quickest way.  Aside from writing a VB module or using a PowerPivot table, dynamically getting the counts might be difficult.

Column A          Column B
[Answer1]          =COUNTIF(A:A, $A1)
[Answer2]          =COUNTIF(A:A, $A2)
[Answer3]          =COUNTIF(A:A, $A3)
[Answer4]          =COUNTIF(A:A, $A4)
[Answer5]          =COUNTIF(A:A, $A5)
[Answer6]          =COUNTIF(A:A, $A6)
0
 
LVL 9

Expert Comment

by:skipper68
ID: 40565068
Here's a VBA function I wrote that'll do what you need.  I included comments inline.

Sub CountUniqueInstances()
    Dim Col As New Collection
    Dim itm
    Dim i As Long
    Dim CellVal As Variant
    
    'Get the last row which has active data
    TheLastRow = ActiveSheet.UsedRange.SpecialCells(xlCellTypeLastCell).Row
    
    'Loop through the single column of each row adding unique values to the collection
    For i = 1 To TheLastRow
        CellVal = Sheets("Sheet1").Range("A" & i).Value
        
        If CellVal <> "" Then
            On Error Resume Next
            Col.Add CellVal, Chr(34) & CellVal & Chr(34)
            On Error GoTo 0
        End If
        
    Next i

    'Each value will be written on row 1 starting at column B
    ColumnIndex = 2 'if you want the column to be C, change this to 3, for D, change this to 4, etc.
    For Each itm In Col
        'count the occurrence of each unique value found
        iVal = Application.WorksheetFunction.CountIf(Range("A:A"), itm)
        Cells(1, ColumnIndex) = itm & ":" & iVal
        ColumnIndex = ColumnIndex + 1
    Next
End Sub

Open in new window

0
 
LVL 2

Author Comment

by:musickmann
ID: 40565183
Thanks. The countif option is something I started with, and I thought would get me part of the way.

so all the answer choices are 1, 2, 3, or 4, which correspond to A, B, C, D

Since the file I have contains hundreds of tests with about 50 questions each, I'm looking for a way to do this en mass.

You would be a true savior if there was a way to create a new sheet that had a summary of some kind.

I'm attaching a file that contains a single test of dummy data, I added a summary sheet as an example of what would be ideal. The "Odd pattern" column would refer to something I could specify somewhere. I would essentially just make a list of what I would consider odd patterns since that is more complicated I know. But the counts would be the first step across the entire document.
0
 
LVL 2

Author Comment

by:musickmann
ID: 40565202
I'm not sure I follow the script you have.

I changed line 12 to   Range("C" & i) since I have the data I need to count in column C.
I changed line 23 to ColumnIndex = 3 for the same reason
I cahnged line 26 to Range(C:C) again for the same reason

Maybe I shouldn't have adjusted those? The result I get in the first row doesn't seem to make sense.
This is Column C that I'm trying to count and what came up in the first row when running the script:

correct_answer_ordinal:1
correct_answer_ordinal
3
3
4
3
3
4
0
 
LVL 9

Expert Comment

by:skipper68
ID: 40565206
I don't think the attachment got included.
0
 
LVL 2

Author Comment

by:musickmann
ID: 40565219
Sorry, I guess I didn't click the Upload button after selecting the file. doh.
AnswerKeyProject.xlsx
0
 
LVL 9

Expert Comment

by:skipper68
ID: 40565318
Maybe try to set the CountIf statements on the Summary Sheet.  It'll automatically update based on the values.

If you add another sheet (i.e. Test), you just have to copy\paste a new row to the summary sheet and edit the sheet name in the 4 formulas.
AnswerKeyProject.xlsx
0
 
LVL 2

Author Comment

by:musickmann
ID: 40567703
That is a great simple solution for the counting of the occurrences. Right now I have about 300 tests or so in a single sheet, is there some way to use a script to run through the sheet, then automatically fill in the summary page, making an entry each time the test name column changes? I thought of the subtotal feature, but I don't know how to make that work counting the occurances of the 4 different answer options in the single column.

Do you have any thoughts on identifying patters? The most critical pattern would be to identify when there are 4 or more of the same answer consecutively.
0
Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

 
LVL 9

Expert Comment

by:skipper68
ID: 40567740
I'm sure a script could be written. Give me a little bit to see what I can come up with.  So just to be clear, there is only 1 sheet with all the tests.  Do you have a more accurate example I can play with?
0
 
LVL 2

Author Comment

by:musickmann
ID: 40567747
Thanks!

Here is a more complete sample set.
AnswerKeys.xlsx
0
 
LVL 9

Expert Comment

by:skipper68
ID: 40567828
I'm glad I asked for an updated version because the PP column wasn't on the first version.

Here's the scoop.  I got a working version.  
1. Open the excel doc I've attached.  If you get an "Enable Macros" warning, click the "Enable Content" button.  Because there's code running behind it, it'll give you this warning unless you change the security settings.  
2. Next, Press "Alt" + F8 to open the macro window
3. Run the "CalculateTestResults" macro
4. Check out the results on the "Summary" tab.
5. Change one of the tests to have 4 concurrent values.  Rerun the macro
     The OddPatterns column will now show a "Y" and be highlighted.

Here are the instructions going forward:
1. Unless you're familiar with creating Excel VB Macros, you'll have to copy your data into this worksheet I provided.
2. Copy the data into the "Tests" sheet of the workbook.
     PLEASE NOTE, the code I wrote is dependent on the column ordering.  If you add, change, or remove columns, someone will have to change the code.
3. Press ALT + F8 to open the macro window.
4. Run the "CalculateTestResults" Macro
5. Check out the results on the summary tab.

Enjoy!
AnswerKeys.xlsm
0
 
LVL 2

Author Comment

by:musickmann
ID: 40568850
So, I'm getting an error:
Run-time error '1004'
Application-defined or object-defined error

The debug option points to this line:
Sheets("Summary").Cells(TestNumber + 2, 1).Value = "Test " & TestNumber

The only difference in data from the sample file is the test name perhaps? The names can contain spaces, dashes, periods, and underscores. I think they are limited to 100 characters in length. Maybe that has something to do with it?

The export file has so much extra info, I tried removing the extraneous stuff so it didn't cause confusion, so the PP column is just points possible on the item.

While I can't write macros, I do have some limited programming skills, most of which is being able to read through and make small changes.

I'm going to try my hand at tracking down the error. I am running on a Mac, and also tried using a windows VM, so not sure if that has anything to do with this.
0
 
LVL 9

Expert Comment

by:skipper68
ID: 40568854
Did the file work the first time you ran it before you made changes?  The problem is based on what you gave me, i assumed all the test names would be "test x".  Hence my request for a more real-life sample set.  :-)  Now that you said its not always the case, there is an issue.  I may be able to tweak the code to look at the previous test name and compare it to the current one to see if they are different.  Lemme take another look.
0
 
LVL 2

Author Comment

by:musickmann
ID: 40568860
Yeah, I was afraid of that. I had to be careful given the source of the data is secure testing material, and since these posts can become publicly available, I was trying to keep things generic as possible.

The number of questions per test is not consistent either, so one test may have 50, one may have 40, etc. I think what I can do is manually replace the test names to match the text x expected, then reconnect the official names after running the analysis.
0
 
LVL 9

Expert Comment

by:skipper68
ID: 40568866
The question count won't matter, the code will handle that. I'll see if I can make the test names more flexible. If it's a ton of work I'll let you know.  You can send me a clear example to my personal email if you want. It'll definitely make testing the new code easier.  Feedback@DRECustom.com
0
 
LVL 2

Author Comment

by:musickmann
ID: 40568867
Well, here is an exact data dump, I did a replacement of test names, so this is the full file I'm working with, same error.

This is the file, after running the script, so you can see the summary results in a full page of 0s and a large file.


Just for reference-
I copied the test names  to a new sheet, removed duplicates
In the next column, names each test consecutively
Did a lookup formula to attach the new test name to the original file, then replaced the original file test names with the new ones.
After the summary, I can easily do another lookup on the list to get the original names back
AnswerKeys-1.xlsm
0
 
LVL 9

Accepted Solution

by:
skipper68 earned 500 total points
ID: 40568894
The reason the file got so large is because the code went rogue because of the test names.  There were over a million rows of summary data! :-)

I've attached a new version.  Run it as is and let me know what happens.  

It's more flexible now that it just searches for the next test name in the sheet and compares it to the previous one it had.  If it matches, it continues the count, if it doesn't, it assumes it's the last record and prints out the summary row.

Let me know how this works.
AnswerKeys-New.xlsm
0
 
LVL 2

Author Closing Comment

by:musickmann
ID: 40569203
Absolutely awesome. You have saved me so much time!
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Improved? Move/Copy Add-in Replacement - How to avoid the annoying, “A formula or sheet you want to move or copy contains the name XXX, which already exists on the destination worksheet.” David Miller (dlmille)  It was one of those days… I wa…
This code takes an Excel list of URL’s and adds a header titled “URL List”. It then searches through all URL’s in column “A”, looking for duplicates. When a duplicate is found, it is moved to the top of the list. The duplicate URL’s are then highlig…
The viewer will learn how to simulate a series of coin tosses with the rand() function and learn how to make these “tosses” depend on a predetermined probability. Flipping Coins in Excel: Enter =RAND() into cell A2: Recalculate the random variable…
This Micro Tutorial demonstrate the bugs in Microsoft Excel for Mac with Pivot Charts.

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now