kevinsmith121
asked on
Count the frequency of words in a range
Hello Experts, i want to achive the following:
I have a single column in Excel with around 1000 rows filled with text up to around 30 words.
What I want to do is get a list of the most frequently used words, say the top 50 most frequent.
Is there a formula I can use to do this?
Kind Regards,
Kevin
I have a single column in Excel with around 1000 rows filled with text up to around 30 words.
What I want to do is get a list of the most frequently used words, say the top 50 most frequent.
Is there a formula I can use to do this?
Kind Regards,
Kevin
ASKER
I am not sure i made myself clear, the text in each cell is the title text from an IT support ticket:
eg "The user has a problem with outlook" or "User's Outlook has crashed" in which case the most frequent words are "Outlook" with 2 and "has" with 2. Does that make sense?
kevin
eg "The user has a problem with outlook" or "User's Outlook has crashed" in which case the most frequent words are "Outlook" with 2 and "has" with 2. Does that make sense?
kevin
Place the code below in any general code module. On the worksheet, select a range of cells two columns wide and 50 rows tall. In the formula bar enter this multiple-cell array formula:
=RankedValuesByFrequency(A 1:A1000,50 )
Press CTRL+SHIFT+ENTER to enter the formula.
[Begin Code Segment]
Public Function RankedValuesByFrequency( _
ByRef ValueList As Range, _
ByVal MaxCount As Long, _
Optional ByVal HighValues As Boolean = True _
) As Variant
' Create a list of the most or least frequently occurring tokens in a range in
' descending or ascending order by frequency along with the number of
' occurrences. If used as a UDF it must be entered as a multiple-cell array
' formula in a range two columns wide.
'
' Syntax
'
' RankedValuesByFrequency(Va lueList, MaxCount, [HighValues])
'
' ValueList - An array of text values. It can be any shape or size.
'
' MaxCount - The maximum number of items to list in the results.
'
' HighValues - Specify True to list the most frequently occurring values, pass
' False to list the least frequently occurring. Optional. If omitted then
' True is assumed.
'
' © 2008 Kevin Jones
Dim Result As Variant
Dim Cell As Range
Dim Token As Variant
Dim Index As Long
Dim ValueIndex As New Collection
Dim Values As Variant
Dim Counts() As Long
Dim Indices() As Long
Dim Index1 As Long
Dim Index2 As Long
Dim Temp As Double
ReDim Result(1 To Application.Caller.Rows.Co unt, 1 To Application.Caller.Columns .Count)
' Inialize ValueIndex to empty so errors don't appear in spreadsheet
For Index1 = 1 To Application.Caller.Rows.Co unt
For Index2 = 1 To Application.Caller.Columns .Count
Result(Index1, Index2) = ""
Next Index2
Next Index1
' Create collection of all unique ValueIndex
Values = Array()
On Error Resume Next
For Each Cell In ValueList
If Len(Cell) > 0 Then
For Each Token In Split(Cell)
Err.Clear
ValueIndex.Add ValueIndex.Count + 1, CStr(Token)
If Err.Number = 0 Then
ReDim Preserve Values(LBound(Values) To UBound(Values) + 1)
Values(UBound(Values)) = CStr(Token)
End If
Next Token
End If
Next Cell
On Error GoTo 0
ReDim Counts(1 To ValueIndex.Count)
ReDim Indices(1 To ValueIndex.Count)
' Calculate counts
For Each Cell In ValueList
If Len(Cell) > 0 Then
For Each Token In Split(Cell)
Counts(ValueIndex(CStr(Tok en))) = Counts(ValueIndex(CStr(Tok en))) + 1
Next Token
End If
Next Cell
' Sort counts
For Index1 = 1 To ValueIndex.Count
Indices(Index1) = Index1
Next Index1
If ValueIndex.Count > 1 Then
For Index1 = 1 To ValueIndex.Count - 1
For Index2 = Index1 + 1 To ValueIndex.Count
If HighValues And Counts(Indices(Index2)) > Counts(Indices(Index1)) Or Not HighValues And Counts(Indices(Index2)) < Counts(Indices(Index1)) Then
Temp = Indices(Index2)
Indices(Index2) = Indices(Index1)
Indices(Index1) = Temp
End If
Next Index2
Next Index1
End If
' Set return ValueIndex
If MaxCount < 1 Then
MaxCount = ValueIndex.Count
End If
For Index1 = 1 To Application.Min(UBound(Res ult), ValueIndex.Count, MaxCount)
Result(Index1, 1) = Values(Indices(Index1) - 1)
Result(Index1, 2) = Counts(Indices(Index1))
Next Index1
' Add note if destination range not large enough
If MaxCount > UBound(Result) Then
Result(UBound(Result)) = "More..."
End If
' Return results
RankedValuesByFrequency = Result
End Function
[End Code Segment]
To add VBA code to a regular or general module in an Excel workbook, press ALT+F11 to open the VBA development environment (VBE). Select the menu command Insert->Module to create a new VBA module. Paste the code into the document window that appears. Press ALT+F11 to return to the Excel workbook.
Kevin
=RankedValuesByFrequency(A
Press CTRL+SHIFT+ENTER to enter the formula.
[Begin Code Segment]
Public Function RankedValuesByFrequency( _
ByRef ValueList As Range, _
ByVal MaxCount As Long, _
Optional ByVal HighValues As Boolean = True _
) As Variant
' Create a list of the most or least frequently occurring tokens in a range in
' descending or ascending order by frequency along with the number of
' occurrences. If used as a UDF it must be entered as a multiple-cell array
' formula in a range two columns wide.
'
' Syntax
'
' RankedValuesByFrequency(Va
'
' ValueList - An array of text values. It can be any shape or size.
'
' MaxCount - The maximum number of items to list in the results.
'
' HighValues - Specify True to list the most frequently occurring values, pass
' False to list the least frequently occurring. Optional. If omitted then
' True is assumed.
'
' © 2008 Kevin Jones
Dim Result As Variant
Dim Cell As Range
Dim Token As Variant
Dim Index As Long
Dim ValueIndex As New Collection
Dim Values As Variant
Dim Counts() As Long
Dim Indices() As Long
Dim Index1 As Long
Dim Index2 As Long
Dim Temp As Double
ReDim Result(1 To Application.Caller.Rows.Co
' Inialize ValueIndex to empty so errors don't appear in spreadsheet
For Index1 = 1 To Application.Caller.Rows.Co
For Index2 = 1 To Application.Caller.Columns
Result(Index1, Index2) = ""
Next Index2
Next Index1
' Create collection of all unique ValueIndex
Values = Array()
On Error Resume Next
For Each Cell In ValueList
If Len(Cell) > 0 Then
For Each Token In Split(Cell)
Err.Clear
ValueIndex.Add ValueIndex.Count + 1, CStr(Token)
If Err.Number = 0 Then
ReDim Preserve Values(LBound(Values) To UBound(Values) + 1)
Values(UBound(Values)) = CStr(Token)
End If
Next Token
End If
Next Cell
On Error GoTo 0
ReDim Counts(1 To ValueIndex.Count)
ReDim Indices(1 To ValueIndex.Count)
' Calculate counts
For Each Cell In ValueList
If Len(Cell) > 0 Then
For Each Token In Split(Cell)
Counts(ValueIndex(CStr(Tok
Next Token
End If
Next Cell
' Sort counts
For Index1 = 1 To ValueIndex.Count
Indices(Index1) = Index1
Next Index1
If ValueIndex.Count > 1 Then
For Index1 = 1 To ValueIndex.Count - 1
For Index2 = Index1 + 1 To ValueIndex.Count
If HighValues And Counts(Indices(Index2)) > Counts(Indices(Index1)) Or Not HighValues And Counts(Indices(Index2)) < Counts(Indices(Index1)) Then
Temp = Indices(Index2)
Indices(Index2) = Indices(Index1)
Indices(Index1) = Temp
End If
Next Index2
Next Index1
End If
' Set return ValueIndex
If MaxCount < 1 Then
MaxCount = ValueIndex.Count
End If
For Index1 = 1 To Application.Min(UBound(Res
Result(Index1, 1) = Values(Indices(Index1) - 1)
Result(Index1, 2) = Counts(Indices(Index1))
Next Index1
' Add note if destination range not large enough
If MaxCount > UBound(Result) Then
Result(UBound(Result)) = "More..."
End If
' Return results
RankedValuesByFrequency = Result
End Function
[End Code Segment]
To add VBA code to a regular or general module in an Excel workbook, press ALT+F11 to open the VBA development environment (VBE). Select the menu command Insert->Module to create a new VBA module. Paste the code into the document window that appears. Press ALT+F11 to return to the Excel workbook.
Kevin
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
That is BRILLIANT!!! Thank you Kevin.
Nice routine
Here is a function that will remove punctuation.
Right now the RankedValuesByFrequency function will return separate results for say
Oracle 23
Oracle, 8
Oracle! 39
If you install the attached NoPunctuation function then you would get
Oracle 70
Regards
Brian
Here is a function that will remove punctuation.
Right now the RankedValuesByFrequency function will return separate results for say
Oracle 23
Oracle, 8
Oracle! 39
If you install the attached NoPunctuation function then you would get
Oracle 70
Regards
Brian
Function NoPunctuation(ByVal T)
Option Base 0
T = LTrim(RTrim(T))
s = Array(",", ".", ";", ":", "?", "!", "/", "\", "(", ")", "[", "]", "{", "}", "'", """")
For i = 0 To UBound(s): T = Replace(T, s(i), " "): Next i 'change punctuation to blanks
Do Until UBound(Split(T, " ")) = 0: T = Join(Split(T, " "), " "): Loop 'remove extra blanks
NoPunctuation = T
End Function
one way would be to create a list of unique values from the original list then use Countif|() against each.
eg.
do
Data
Filter
Advanced
Copy list to another location
tick uniuqe records only
then select a column to place these values
then against each enter the formular
eg. asuming the original list is in col a and the unique list is in col c
=countif(A:A,C1)
this will give you a count against each unique value
steve