Count the frequency of words in a range

Hello Experts, i want to achive the following:
I have a single column in Excel with around 1000 rows filled with text up to around 30 words.
What I want to do is get a list of the most frequently used words, say the top 50 most frequent.
Is there a formula I can use to do this?
Kind Regards,
Kevin
LVL 1
kevinsmith121Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

dambuster99Commented:
Couple of ways to achieve this.
one way would be to create a list of unique values from the original list then use Countif|() against each.

eg.
do
Data
Filter
Advanced
Copy list to another location
tick uniuqe records only
then select a column to place these values

then against each enter the formular
eg. asuming the original list is in col a and the unique list is in col c
=countif(A:A,C1)

this will give you a count against each unique value

steve
kevinsmith121Author Commented:
I am not sure i made myself clear, the text in each cell is the title text from an IT support ticket:
eg "The user has a problem with outlook" or "User's Outlook has crashed" in which case the most frequent words are "Outlook" with 2 and "has" with 2. Does that make sense?
kevin
zorvek (Kevin Jones)ConsultantCommented:
Place the code below in any general code module. On the worksheet, select a range of cells two columns wide and 50 rows tall. In the formula bar enter this multiple-cell array formula:

   =RankedValuesByFrequency(A1:A1000,50)

Press CTRL+SHIFT+ENTER to enter the formula.

[Begin Code Segment]

Public Function RankedValuesByFrequency( _
      ByRef ValueList As Range, _
      ByVal MaxCount As Long, _
      Optional ByVal HighValues As Boolean = True _
   ) As Variant

' Create a list of the most or least frequently occurring tokens in a range in
' descending or ascending order by frequency along with the number of
' occurrences. If used as a UDF it must be entered as a multiple-cell array
' formula in a range two columns wide.
'
' Syntax
'
' RankedValuesByFrequency(ValueList, MaxCount, [HighValues])
'
' ValueList - An array of text values. It can be any shape or size.
'
' MaxCount - The maximum number of items to list in the results.
'
' HighValues - Specify True to list the most frequently occurring values, pass
'    False to list the least frequently occurring. Optional. If omitted then
'    True is assumed.
'
' © 2008 Kevin Jones

   Dim Result As Variant
   Dim Cell As Range
   Dim Token As Variant
   Dim Index As Long
   Dim ValueIndex As New Collection
   Dim Values As Variant
   Dim Counts() As Long
   Dim Indices() As Long
   Dim Index1 As Long
   Dim Index2 As Long
   Dim Temp As Double
   
   ReDim Result(1 To Application.Caller.Rows.Count, 1 To Application.Caller.Columns.Count)

   ' Inialize ValueIndex to empty so errors don't appear in spreadsheet
   For Index1 = 1 To Application.Caller.Rows.Count
      For Index2 = 1 To Application.Caller.Columns.Count
         Result(Index1, Index2) = ""
      Next Index2
   Next Index1
   
   ' Create collection of all unique ValueIndex
   Values = Array()
   On Error Resume Next
   For Each Cell In ValueList
      If Len(Cell) > 0 Then
         For Each Token In Split(Cell)
            Err.Clear
            ValueIndex.Add ValueIndex.Count + 1, CStr(Token)
            If Err.Number = 0 Then
               ReDim Preserve Values(LBound(Values) To UBound(Values) + 1)
               Values(UBound(Values)) = CStr(Token)
            End If
         Next Token
      End If
   Next Cell
   On Error GoTo 0
   
   ReDim Counts(1 To ValueIndex.Count)
   ReDim Indices(1 To ValueIndex.Count)
   
   ' Calculate counts
   For Each Cell In ValueList
      If Len(Cell) > 0 Then
         For Each Token In Split(Cell)
            Counts(ValueIndex(CStr(Token))) = Counts(ValueIndex(CStr(Token))) + 1
         Next Token
      End If
   Next Cell
   
   ' Sort counts
   For Index1 = 1 To ValueIndex.Count
      Indices(Index1) = Index1
   Next Index1
   If ValueIndex.Count > 1 Then
      For Index1 = 1 To ValueIndex.Count - 1
         For Index2 = Index1 + 1 To ValueIndex.Count
            If HighValues And Counts(Indices(Index2)) > Counts(Indices(Index1)) Or Not HighValues And Counts(Indices(Index2)) < Counts(Indices(Index1)) Then
               Temp = Indices(Index2)
               Indices(Index2) = Indices(Index1)
               Indices(Index1) = Temp
            End If
         Next Index2
      Next Index1
   End If
   
   ' Set return ValueIndex
   If MaxCount < 1 Then
      MaxCount = ValueIndex.Count
   End If
   For Index1 = 1 To Application.Min(UBound(Result), ValueIndex.Count, MaxCount)
      Result(Index1, 1) = Values(Indices(Index1) - 1)
      Result(Index1, 2) = Counts(Indices(Index1))
   Next Index1
   
   ' Add note if destination range not large enough
   If MaxCount > UBound(Result) Then
      Result(UBound(Result)) = "More..."
   End If
   
   ' Return results
   RankedValuesByFrequency = Result

End Function

[End Code Segment]

To add VBA code to a regular or general module in an Excel workbook, press ALT+F11 to open the VBA development environment (VBE). Select the menu command Insert->Module to create a new VBA module. Paste the code into the document window that appears. Press ALT+F11 to return to the Excel workbook.

Kevin
Fundamentals of JavaScript

Learn the fundamentals of the popular programming language JavaScript so that you can explore the realm of web development.

zorvek (Kevin Jones)ConsultantCommented:
An improved version of the above UDF. Same instructions.

[Begin Code Segment]


Public Function RankedValuesByFrequency( _
      ByRef ValueList As Range, _
      Optional ByVal MaxCount As Long, _
      Optional ByVal HighValues As Boolean = True _
   ) As Variant

' Create a list of the most or least frequently occurring tokens in a range in
' descending or ascending order by frequency along with the number of
' occurrences. If used as a UDF it must be entered as a multiple-cell array
' formula in a range two columns wide.
'
' Syntax
'
' RankedValuesByFrequency(ValueList, MaxCount, [HighValues])
'
' ValueList - An array of text values. It can be any shape or size.
'
' MaxCount - The maximum number of items to list in the results.
'
' HighValues - Specify True to list the most frequently occurring values, pass
'    False to list the least frequently occurring. Optional. If omitted then
'    True is assumed.
'
' © 2008 Kevin Jones

   Dim Result As Variant
   Dim Cell As Range
   Dim Token As Variant
   Dim Index As Long
   Dim ValueIndex As New Collection
   Dim Values As Variant
   Dim Counts() As Long
   Dim Indices() As Long
   Dim Index1 As Long
   Dim Index2 As Long
   Dim Temp As Double
   
   ReDim Result(1 To Application.Caller.Rows.Count, 1 To Application.Caller.Columns.Count)
   
   If MaxCount = 0 Then MaxCount = Application.Caller.Rows.Count + 1

   ' Inialize ValueIndex to empty so errors don't appear in spreadsheet
   For Index1 = 1 To Application.Caller.Rows.Count
      For Index2 = 1 To Application.Caller.Columns.Count
         Result(Index1, Index2) = ""
      Next Index2
   Next Index1
   
   ' Create collection of all unique ValueIndex
   Values = Array()
   On Error Resume Next
   For Each Cell In ValueList
      If Len(Cell) > 0 Then
         For Each Token In Split(Cell)
            Err.Clear
            ValueIndex.Add ValueIndex.Count + 1, CStr(Token)
            If Err.Number = 0 Then
               ReDim Preserve Values(LBound(Values) To UBound(Values) + 1)
               Values(UBound(Values)) = CStr(Token)
            End If
         Next Token
      End If
   Next Cell
   On Error GoTo 0
   
   ReDim Counts(1 To ValueIndex.Count)
   ReDim Indices(1 To ValueIndex.Count)
   
   ' Calculate counts
   For Each Cell In ValueList
      If Len(Cell) > 0 Then
         For Each Token In Split(Cell)
            Counts(ValueIndex(CStr(Token))) = Counts(ValueIndex(CStr(Token))) + 1
         Next Token
      End If
   Next Cell
   
   ' Sort counts
   For Index1 = 1 To ValueIndex.Count
      Indices(Index1) = Index1
   Next Index1
   If ValueIndex.Count > 1 Then
      For Index1 = 1 To ValueIndex.Count - 1
         For Index2 = Index1 + 1 To ValueIndex.Count
            If HighValues And Counts(Indices(Index2)) > Counts(Indices(Index1)) Or Not HighValues And Counts(Indices(Index2)) < Counts(Indices(Index1)) Then
               Temp = Indices(Index2)
               Indices(Index2) = Indices(Index1)
               Indices(Index1) = Temp
            End If
         Next Index2
      Next Index1
   End If
   
   ' Set return values
   If MaxCount < 1 Then
      MaxCount = ValueIndex.Count
   End If
   For Index1 = 1 To Application.Min(UBound(Result, 1), ValueIndex.Count, MaxCount)
      Result(Index1, 1) = Values(Indices(Index1) - 1)
      If Application.Caller.Columns.Count > 1 Then Result(Index1, 2) = Counts(Indices(Index1))
   Next Index1
   
   ' Add note if destination range is not large enough
   If MaxCount > UBound(Result, 1) And ValueIndex.Count > UBound(Result, 1) Then
      Result(UBound(Result, 1), 1) = "More..."
   End If
   
   ' Return results
   RankedValuesByFrequency = Result

End Function

[End Code Segment]

Kevin

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
kevinsmith121Author Commented:
That is BRILLIANT!!! Thank you Kevin.
canesbrCommented:
Nice routine
Here is a function that will remove punctuation.
Right now the RankedValuesByFrequency function will return separate results for say
Oracle  23
Oracle, 8
Oracle! 39
If you install the attached NoPunctuation function then you would get
Oracle 70
Regards
Brian


Function NoPunctuation(ByVal T)
    Option Base 0
    T = LTrim(RTrim(T))
    s = Array(",", ".", ";", ":", "?", "!", "/", "\", "(", ")", "[", "]", "{", "}", "'", """")
    For i = 0 To UBound(s): T = Replace(T, s(i), " "): Next i 'change punctuation to blanks
    Do Until UBound(Split(T, "  ")) = 0: T = Join(Split(T, "  "), " "): Loop 'remove extra blanks
    NoPunctuation = T
End Function

Open in new window

It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft Excel

From novice to tech pro — start learning today.