Solved

VB.NET Word Count algorithm

Posted on 2009-04-06
2
1,401 Views
Last Modified: 2012-05-06
Can anyone suggest a model algorithm to (a) output the number of words in the text in the code window and (b) a separate algorithm to count the frequency of each word appearing in the text?

pythonV
The vessel, which is operated by an Italian company, carried a crew of 24, from Bulgaria, Ukraine, Russia and the Philippines, Britain's Telegraph newspaper reported.

Open in new window

0
Comment
Question by:pythonV
2 Comments
 
LVL 15

Accepted Solution

by:
ChloesDad earned 250 total points
ID: 24082299
Hi, The first one is fairly easy. Thers is a string.split function which returns a string array

dim Words() as string = mytext.split(convert.tochar(" "))
dim NumberofWords as integer = Words.length

You can then use the array to work out the number of times each word appears, although not a problem with the given text, the punctuation would have to be removed from the original string first otherwise "vessel," and "vessel" would be treated as different words.
0
 
LVL 10

Assisted Solution

by:MrClyfar
MrClyfar earned 250 total points
ID: 24082418
Hi there.
Here's a quick and dirty way to get the number of times each word is used in the string. As ChloesDad noted, you may need to change the code to take care of characters such as ',:; etc.
You need to use .NET framework 3.5 for my code example to work.
Hope this helps.
Jas.

Imports System.Text.RegularExpressions
 

Module Module1
 

    Sub Main()

        Dim inputString As String = "The vessel, which is operated by an Italian company, carried a crew of 24, from the country of Bulgaria, Ukraine, Russia and the Philippines, Britain's Telegraph newspaper reported."
 

        Dim wordCounts As Dictionary(Of String, Integer) = GetWordUsageCount(inputString)
 

        For Each kvp As KeyValuePair(Of String, Integer) In wordCounts

            Console.WriteLine("Word = {0}, Count = {1}", kvp.Key, kvp.Value)

        Next
 

    End Sub
 

    Private Function GetWordUsageCount(ByVal input As String) As Dictionary(Of String, Integer)
 

        Dim m As MatchCollection = Regex.Matches(input, "[^\ ^\t^\n^,]+", RegexOptions.Singleline)
 

        Dim words = (From word In m _

                     Select word.value).ToList()
 

        Dim wordGroups = From word In words _

                         Group By word.ToString.ToLower _

                         Into wordCount = Count()
 

        Return wordGroups.ToDictionary(Of String, Integer)(Function(key) key.ToLower, Function(value) value.wordCount)
 
 

    End Function
 

End Module

Open in new window

0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

I think the Typed DataTable and Typed DataSet are very good options when working with data, but I don't like auto-generated code. First, I create an Abstract Class for my DataTables Common Code.  This class Inherits from DataTable. Also, it can …
A while ago, I was working on a Windows Forms application and I needed a special label control with reflection (glass) effect to show some titles in a stylish way. I've always enjoyed working with graphics, but it's never too clever to re-invent …
It is a freely distributed piece of software for such tasks as photo retouching, image composition and image authoring. It works on many operating systems, in many languages.
This video discusses moving either the default database or any database to a new volume.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now