Link to home
Start Free TrialLog in
Avatar of kevinsmith121
kevinsmith121

asked on

Count the frequency of words in a range

Hello Experts, i want to achive the following:
I have a single column in Excel with around 1000 rows filled with text up to around 30 words.
What I want to do is get a list of the most frequently used words, say the top 50 most frequent.
Is there a formula I can use to do this?
Kind Regards,
Kevin
Avatar of dambuster99
dambuster99

Couple of ways to achieve this.
one way would be to create a list of unique values from the original list then use Countif|() against each.

eg.
do
Data
Filter
Advanced
Copy list to another location
tick uniuqe records only
then select a column to place these values

then against each enter the formular
eg. asuming the original list is in col a and the unique list is in col c
=countif(A:A,C1)

this will give you a count against each unique value

steve
Avatar of kevinsmith121

ASKER

I am not sure i made myself clear, the text in each cell is the title text from an IT support ticket:
eg "The user has a problem with outlook" or "User's Outlook has crashed" in which case the most frequent words are "Outlook" with 2 and "has" with 2. Does that make sense?
kevin
Avatar of zorvek (Kevin Jones)
Place the code below in any general code module. On the worksheet, select a range of cells two columns wide and 50 rows tall. In the formula bar enter this multiple-cell array formula:

   =RankedValuesByFrequency(A1:A1000,50)

Press CTRL+SHIFT+ENTER to enter the formula.

[Begin Code Segment]

Public Function RankedValuesByFrequency( _
      ByRef ValueList As Range, _
      ByVal MaxCount As Long, _
      Optional ByVal HighValues As Boolean = True _
   ) As Variant

' Create a list of the most or least frequently occurring tokens in a range in
' descending or ascending order by frequency along with the number of
' occurrences. If used as a UDF it must be entered as a multiple-cell array
' formula in a range two columns wide.
'
' Syntax
'
' RankedValuesByFrequency(ValueList, MaxCount, [HighValues])
'
' ValueList - An array of text values. It can be any shape or size.
'
' MaxCount - The maximum number of items to list in the results.
'
' HighValues - Specify True to list the most frequently occurring values, pass
'    False to list the least frequently occurring. Optional. If omitted then
'    True is assumed.
'
' © 2008 Kevin Jones

   Dim Result As Variant
   Dim Cell As Range
   Dim Token As Variant
   Dim Index As Long
   Dim ValueIndex As New Collection
   Dim Values As Variant
   Dim Counts() As Long
   Dim Indices() As Long
   Dim Index1 As Long
   Dim Index2 As Long
   Dim Temp As Double
   
   ReDim Result(1 To Application.Caller.Rows.Count, 1 To Application.Caller.Columns.Count)

   ' Inialize ValueIndex to empty so errors don't appear in spreadsheet
   For Index1 = 1 To Application.Caller.Rows.Count
      For Index2 = 1 To Application.Caller.Columns.Count
         Result(Index1, Index2) = ""
      Next Index2
   Next Index1
   
   ' Create collection of all unique ValueIndex
   Values = Array()
   On Error Resume Next
   For Each Cell In ValueList
      If Len(Cell) > 0 Then
         For Each Token In Split(Cell)
            Err.Clear
            ValueIndex.Add ValueIndex.Count + 1, CStr(Token)
            If Err.Number = 0 Then
               ReDim Preserve Values(LBound(Values) To UBound(Values) + 1)
               Values(UBound(Values)) = CStr(Token)
            End If
         Next Token
      End If
   Next Cell
   On Error GoTo 0
   
   ReDim Counts(1 To ValueIndex.Count)
   ReDim Indices(1 To ValueIndex.Count)
   
   ' Calculate counts
   For Each Cell In ValueList
      If Len(Cell) > 0 Then
         For Each Token In Split(Cell)
            Counts(ValueIndex(CStr(Token))) = Counts(ValueIndex(CStr(Token))) + 1
         Next Token
      End If
   Next Cell
   
   ' Sort counts
   For Index1 = 1 To ValueIndex.Count
      Indices(Index1) = Index1
   Next Index1
   If ValueIndex.Count > 1 Then
      For Index1 = 1 To ValueIndex.Count - 1
         For Index2 = Index1 + 1 To ValueIndex.Count
            If HighValues And Counts(Indices(Index2)) > Counts(Indices(Index1)) Or Not HighValues And Counts(Indices(Index2)) < Counts(Indices(Index1)) Then
               Temp = Indices(Index2)
               Indices(Index2) = Indices(Index1)
               Indices(Index1) = Temp
            End If
         Next Index2
      Next Index1
   End If
   
   ' Set return ValueIndex
   If MaxCount < 1 Then
      MaxCount = ValueIndex.Count
   End If
   For Index1 = 1 To Application.Min(UBound(Result), ValueIndex.Count, MaxCount)
      Result(Index1, 1) = Values(Indices(Index1) - 1)
      Result(Index1, 2) = Counts(Indices(Index1))
   Next Index1
   
   ' Add note if destination range not large enough
   If MaxCount > UBound(Result) Then
      Result(UBound(Result)) = "More..."
   End If
   
   ' Return results
   RankedValuesByFrequency = Result

End Function

[End Code Segment]

To add VBA code to a regular or general module in an Excel workbook, press ALT+F11 to open the VBA development environment (VBE). Select the menu command Insert->Module to create a new VBA module. Paste the code into the document window that appears. Press ALT+F11 to return to the Excel workbook.

Kevin
ASKER CERTIFIED SOLUTION
Avatar of zorvek (Kevin Jones)
zorvek (Kevin Jones)
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
That is BRILLIANT!!! Thank you Kevin.
Nice routine
Here is a function that will remove punctuation.
Right now the RankedValuesByFrequency function will return separate results for say
Oracle  23
Oracle, 8
Oracle! 39
If you install the attached NoPunctuation function then you would get
Oracle 70
Regards
Brian


Function NoPunctuation(ByVal T)
    Option Base 0
    T = LTrim(RTrim(T))
    s = Array(",", ".", ";", ":", "?", "!", "/", "\", "(", ")", "[", "]", "{", "}", "'", """")
    For i = 0 To UBound(s): T = Replace(T, s(i), " "): Next i 'change punctuation to blanks
    Do Until UBound(Split(T, "  ")) = 0: T = Join(Split(T, "  "), " "): Loop 'remove extra blanks
    NoPunctuation = T
End Function

Open in new window