Link to home
Start Free TrialLog in
Avatar of Mike_Stevens
Mike_StevensFlag for United States of America

asked on

Search text for a four character string counting the occurrences and location using vb.net

I have a text file that I need to search looking for a four character string.  I need to count the number of occurrences and if the four character string is found get the location of each occurrence.   I have to problem opening the file and reading the content.   I'm just having a issue with figuring out the search.

to be more detailed I need to find the string "ISA*" that is stored in the variable "xInput" and count the occurrences and get the location of each so that it can be used later.  

I've seen several examples online but I cant seem to get any to work.  Any help would be appreciated.

Thanks
Mike
Avatar of ChloesDad
ChloesDad
Flag of United Kingdom of Great Britain and Northern Ireland image

This should do the job

        Dim StringFinder As New StringFinderClass

        StringFinder.SourceString = "I have a text file that I need to search looking for a four character string.  I need to count the number of occurrences and if the four character string is found get the location of each occurrence.   I have to problem opening the file and reading the content.   I'm just having a issue with figuring out the search."
        StringFinder.SearchString = "to"

        StringFinder.FindMatches()

        Console.WriteLine(StringFinder.Matches)

        Console.WriteLine(StringFinder.NumberOfMatches)

Open in new window


Public Class StringFinderClass

    Public Property SearchString As String
    Public Property SourceString As String

    Public Property NumberOfMatches As Integer = 0
    Public Property Matches As HashSet(Of Integer) = New HashSet(Of Integer)

    Public Sub FindMatches()

        Dim Count1 As Integer
        Dim NumberOfCharacters As Integer = SourceString.Length

        For Count1 = 1 To NumberOfCharacters

            If SourceString.Substring(Count1 - 1).StartsWith(SearchString) Then

                NumberOfMatches += 1
                Matches.Add(Count1)

            End If

        Next Count1


    End Sub

End Class

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of Fernando Soto
Fernando Soto
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Mike_Stevens

ASKER

I need to find all four (ISA*) characters together.   I am using a stream reader to read the contents of the text file....

 xInput = objReader.ReadToEnd()
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi Mike;

This code snippet will do what you need.

Imports System.Text.RegularExpressions


'' Your returned string from the StreamReader
Dim xInput = objReader.ReadToEnd()
'' Regex Pattern to search the string
Dim pattern As String = "(ISA*)"
'' Locate the matches of the pattern
Dim mc As MatchCollection = Regex.Matches(xInput, pattern)
'' Number of occurrences / Matches found
Dim totalCount As Integer = mc.Count()
'' A list of start locations for each occurrence from the begining of the string.
Dim locations As List(Of Integer) = (From m As Match In mc Select m.Index).ToList()

Open in new window

Jacques.....your example works.  Thank you.   The test file that I an using to search has the value of 'ISA*' in it twice.  The first  occurrences is the first four characters in the file and the second is later on in the text.  Your example finds the second occurrence but not the first.   An example of the first line of the file is:  

ISA*00*BLUE*YELLOW*GREEN......  

The other occurrence is later on in the file mixed is the middle of other text.

I don't know why it wont find the first.
Hi Mike did you check out my solution?
Just a little adjustment. IndexOf does not include the starting index, so the "i" at position 0 was not included in the search.

Simply start looking at -1 to include it:

		Dim index As Integer = -1
		Dim count As Integer

		While index < xInput.Length
			index = xInput.IndexOf("isa*", index + 1)
			If index = -1 Then Exit While
			count = count + 1
			Console.WriteLine("Occurence " & count & " at position " & index)
		End While
		Console.WriteLine("Count : " & count)

Open in new window

Fernando....I tried yours and its nice and clean and compact but when I search for "ISA*" with the Asterisk included it does not return the correct number of occurrences .   ISA* appears twice in my test file and it returns that it found four occurrences .  If I remove the Asterisk it works fine but I must use the asterisk to prevent text that contain "ISA" from being returned.
Sorry Mike, the * has a special meaning in regular expressions, its meaning is to take any number of the preceding character as part of the search. To include the * as a normal character it must be escaped as shown below. Please use this statement in place of the other one.

Dim pattern As String = "(ISA\*)"

Open in new window