# Count the frequency of words in a range

Hello Experts, i want to achive the following:
I have a single column in Excel with around 1000 rows filled with text up to around 30 words.
What I want to do is get a list of the most frequently used words, say the top 50 most frequent.
Is there a formula I can use to do this?
Kind Regards,
Kevin
###### Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Commented:
Couple of ways to achieve this.
one way would be to create a list of unique values from the original list then use Countif|() against each.

eg.
do
Data
Filter
Copy list to another location
tick uniuqe records only
then select a column to place these values

then against each enter the formular
eg. asuming the original list is in col a and the unique list is in col c
=countif(A:A,C1)

this will give you a count against each unique value

steve
Author Commented:
I am not sure i made myself clear, the text in each cell is the title text from an IT support ticket:
eg "The user has a problem with outlook" or "User's Outlook has crashed" in which case the most frequent words are "Outlook" with 2 and "has" with 2. Does that make sense?
kevin
ConsultantCommented:
Place the code below in any general code module. On the worksheet, select a range of cells two columns wide and 50 rows tall. In the formula bar enter this multiple-cell array formula:

=RankedValuesByFrequency(A1:A1000,50)

Press CTRL+SHIFT+ENTER to enter the formula.

[Begin Code Segment]

Public Function RankedValuesByFrequency( _
ByRef ValueList As Range, _
ByVal MaxCount As Long, _
Optional ByVal HighValues As Boolean = True _
) As Variant

' Create a list of the most or least frequently occurring tokens in a range in
' descending or ascending order by frequency along with the number of
' occurrences. If used as a UDF it must be entered as a multiple-cell array
' formula in a range two columns wide.
'
' Syntax
'
' RankedValuesByFrequency(ValueList, MaxCount, [HighValues])
'
' ValueList - An array of text values. It can be any shape or size.
'
' MaxCount - The maximum number of items to list in the results.
'
' HighValues - Specify True to list the most frequently occurring values, pass
'    False to list the least frequently occurring. Optional. If omitted then
'    True is assumed.
'
' © 2008 Kevin Jones

Dim Result As Variant
Dim Cell As Range
Dim Token As Variant
Dim Index As Long
Dim ValueIndex As New Collection
Dim Values As Variant
Dim Counts() As Long
Dim Indices() As Long
Dim Index1 As Long
Dim Index2 As Long
Dim Temp As Double

ReDim Result(1 To Application.Caller.Rows.Count, 1 To Application.Caller.Columns.Count)

' Inialize ValueIndex to empty so errors don't appear in spreadsheet
For Index1 = 1 To Application.Caller.Rows.Count
For Index2 = 1 To Application.Caller.Columns.Count
Result(Index1, Index2) = ""
Next Index2
Next Index1

' Create collection of all unique ValueIndex
Values = Array()
On Error Resume Next
For Each Cell In ValueList
If Len(Cell) > 0 Then
For Each Token In Split(Cell)
Err.Clear
ValueIndex.Add ValueIndex.Count + 1, CStr(Token)
If Err.Number = 0 Then
ReDim Preserve Values(LBound(Values) To UBound(Values) + 1)
Values(UBound(Values)) = CStr(Token)
End If
Next Token
End If
Next Cell
On Error GoTo 0

ReDim Counts(1 To ValueIndex.Count)
ReDim Indices(1 To ValueIndex.Count)

' Calculate counts
For Each Cell In ValueList
If Len(Cell) > 0 Then
For Each Token In Split(Cell)
Counts(ValueIndex(CStr(Token))) = Counts(ValueIndex(CStr(Token))) + 1
Next Token
End If
Next Cell

' Sort counts
For Index1 = 1 To ValueIndex.Count
Indices(Index1) = Index1
Next Index1
If ValueIndex.Count > 1 Then
For Index1 = 1 To ValueIndex.Count - 1
For Index2 = Index1 + 1 To ValueIndex.Count
If HighValues And Counts(Indices(Index2)) > Counts(Indices(Index1)) Or Not HighValues And Counts(Indices(Index2)) < Counts(Indices(Index1)) Then
Temp = Indices(Index2)
Indices(Index2) = Indices(Index1)
Indices(Index1) = Temp
End If
Next Index2
Next Index1
End If

' Set return ValueIndex
If MaxCount < 1 Then
MaxCount = ValueIndex.Count
End If
For Index1 = 1 To Application.Min(UBound(Result), ValueIndex.Count, MaxCount)
Result(Index1, 1) = Values(Indices(Index1) - 1)
Result(Index1, 2) = Counts(Indices(Index1))
Next Index1

' Add note if destination range not large enough
If MaxCount > UBound(Result) Then
Result(UBound(Result)) = "More..."
End If

' Return results
RankedValuesByFrequency = Result

End Function

[End Code Segment]

To add VBA code to a regular or general module in an Excel workbook, press ALT+F11 to open the VBA development environment (VBE). Select the menu command Insert->Module to create a new VBA module. Paste the code into the document window that appears. Press ALT+F11 to return to the Excel workbook.

Kevin
ConsultantCommented:
An improved version of the above UDF. Same instructions.

[Begin Code Segment]

Public Function RankedValuesByFrequency( _
ByRef ValueList As Range, _
Optional ByVal MaxCount As Long, _
Optional ByVal HighValues As Boolean = True _
) As Variant

' Create a list of the most or least frequently occurring tokens in a range in
' descending or ascending order by frequency along with the number of
' occurrences. If used as a UDF it must be entered as a multiple-cell array
' formula in a range two columns wide.
'
' Syntax
'
' RankedValuesByFrequency(ValueList, MaxCount, [HighValues])
'
' ValueList - An array of text values. It can be any shape or size.
'
' MaxCount - The maximum number of items to list in the results.
'
' HighValues - Specify True to list the most frequently occurring values, pass
'    False to list the least frequently occurring. Optional. If omitted then
'    True is assumed.
'
' © 2008 Kevin Jones

Dim Result As Variant
Dim Cell As Range
Dim Token As Variant
Dim Index As Long
Dim ValueIndex As New Collection
Dim Values As Variant
Dim Counts() As Long
Dim Indices() As Long
Dim Index1 As Long
Dim Index2 As Long
Dim Temp As Double

ReDim Result(1 To Application.Caller.Rows.Count, 1 To Application.Caller.Columns.Count)

If MaxCount = 0 Then MaxCount = Application.Caller.Rows.Count + 1

' Inialize ValueIndex to empty so errors don't appear in spreadsheet
For Index1 = 1 To Application.Caller.Rows.Count
For Index2 = 1 To Application.Caller.Columns.Count
Result(Index1, Index2) = ""
Next Index2
Next Index1

' Create collection of all unique ValueIndex
Values = Array()
On Error Resume Next
For Each Cell In ValueList
If Len(Cell) > 0 Then
For Each Token In Split(Cell)
Err.Clear
ValueIndex.Add ValueIndex.Count + 1, CStr(Token)
If Err.Number = 0 Then
ReDim Preserve Values(LBound(Values) To UBound(Values) + 1)
Values(UBound(Values)) = CStr(Token)
End If
Next Token
End If
Next Cell
On Error GoTo 0

ReDim Counts(1 To ValueIndex.Count)
ReDim Indices(1 To ValueIndex.Count)

' Calculate counts
For Each Cell In ValueList
If Len(Cell) > 0 Then
For Each Token In Split(Cell)
Counts(ValueIndex(CStr(Token))) = Counts(ValueIndex(CStr(Token))) + 1
Next Token
End If
Next Cell

' Sort counts
For Index1 = 1 To ValueIndex.Count
Indices(Index1) = Index1
Next Index1
If ValueIndex.Count > 1 Then
For Index1 = 1 To ValueIndex.Count - 1
For Index2 = Index1 + 1 To ValueIndex.Count
If HighValues And Counts(Indices(Index2)) > Counts(Indices(Index1)) Or Not HighValues And Counts(Indices(Index2)) < Counts(Indices(Index1)) Then
Temp = Indices(Index2)
Indices(Index2) = Indices(Index1)
Indices(Index1) = Temp
End If
Next Index2
Next Index1
End If

' Set return values
If MaxCount < 1 Then
MaxCount = ValueIndex.Count
End If
For Index1 = 1 To Application.Min(UBound(Result, 1), ValueIndex.Count, MaxCount)
Result(Index1, 1) = Values(Indices(Index1) - 1)
If Application.Caller.Columns.Count > 1 Then Result(Index1, 2) = Counts(Indices(Index1))
Next Index1

' Add note if destination range is not large enough
If MaxCount > UBound(Result, 1) And ValueIndex.Count > UBound(Result, 1) Then
Result(UBound(Result, 1), 1) = "More..."
End If

' Return results
RankedValuesByFrequency = Result

End Function

[End Code Segment]

Kevin

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Author Commented:
That is BRILLIANT!!! Thank you Kevin.
Commented:
Nice routine
Here is a function that will remove punctuation.
Right now the RankedValuesByFrequency function will return separate results for say
Oracle  23
Oracle, 8
Oracle! 39
If you install the attached NoPunctuation function then you would get
Oracle 70
Regards
Brian

``````Function NoPunctuation(ByVal T)
Option Base 0
T = LTrim(RTrim(T))
s = Array(",", ".", ";", ":", "?", "!", "/", "\", "(", ")", "[", "]", "{", "}", "'", """")
For i = 0 To UBound(s): T = Replace(T, s(i), " "): Next i 'change punctuation to blanks
Do Until UBound(Split(T, "  ")) = 0: T = Join(Split(T, "  "), " "): Loop 'remove extra blanks
NoPunctuation = T
End Function
``````
###### It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft Excel

From novice to tech pro — start learning today.