Link to home
Start Free TrialLog in
Avatar of Mr_Fulano
Mr_FulanoFlag for United States of America

asked on

Finding Subparts of Strings in TXT Doc

Hi,
I have a TXT document that I want to search for a particular sub-string within other strings. I can currently find the sub-string in its entirety (not part of a larger string), but I also would like to find where other sub-strings exist. For example, if my sub-string is  "dri" , I would like the output to Console.Writeline () to be as follows:

drive
driver
driving
drivable
dried
drink

...and so on.  Therefore, the code would have to find the sub-string (in this case "dri") and return the larger string everywhere in my document where the sub-part is found.

Thank you,
Fulano
Avatar of JackOfPH
JackOfPH
Flag of Philippines image

Sub FindWord(Byval SearchWord as string, Byval PathOfTextFiles as string)

Dim arrWord() As String = IO.File.ReadAllLines(PathOfTextFile)
Dim arrExtractedWord = arrContacts(ctr).Split(" ")

   
For ctr as integer = 0 to arrExtractedWord.length - 1

if instr(0,SearchWord,arrExtractedWord(ctr).tostring)<> 0  then

console.writeline(arrExtractedWord(ctr).tostring)

next

end sub
Avatar of Dirk Haest
       Dim oRead As System.IO.StreamReader
        Dim LineIn As String

        oRead = IO.File.OpenText("C:\temp\subparts.txt")

        While oRead.Peek <> -1
            LineIn = oRead.ReadLine()
            Dim i As Integer
            Dim tempString As String
            i = 0
            tempString = LineIn
            While tempString.IndexOf("dri") > 0
                If tempString.IndexOf(" ", tempString.IndexOf("dri")) > 0 Then
                    Console.WriteLine(tempString.Substring(tempString.IndexOf("dri"), tempString.IndexOf(" ", tempString.IndexOf("dri")) - tempString.IndexOf("dri")))
                    tempString = tempString.Substring(tempString.IndexOf("dri") + 3)
                Else
                    Console.WriteLine(tempString.Substring(tempString.IndexOf("dri")))
                    tempString = tempString.Substring(tempString.IndexOf("dri") + 3)

                End If
            End While

        End While

        oRead.Close()
To JackOfPh
There are some issues with your code:
1) Dim arrWord() As String = IO.File.ReadAllLines(PathOfTextFile)
   must be Dim arrWord() As String = IO.File.ReadAllLines(PathOfTextFiles)
2) How is arrExtractedWord and ctr declared ?
3) Instr is not the default OO-way.
4) when testing your code with this file below, I get errors !

this is my drive
The driver is driving an old chevy.
drivable
dried
drin
Adjusted the code of JackOfPh (who had also the result from the first line in the file)

    Sub FindWord(ByVal SearchWord As String, ByVal PathOfTextFiles As String)
        Dim arrWord() As String = IO.File.ReadAllLines(PathOfTextFiles)
        Dim ctr As Integer = 0
        For ctrLines As Integer = 0 To arrWord.Length
            Dim arrExtractedWord() As String = arrWord(ctr).Split(" ")
            For ctr = 0 To arrExtractedWord.Length - 1
                If arrExtractedWord(ctr).IndexOf(SearchWord) >= 0 Then
                    Console.WriteLine(arrExtractedWord(ctr).ToString)
                End If
            Next
        Next
    End Sub

Dhaest Thanks,



Dim arrWord() As String = IO.File.ReadAllLines(PathOfTextFile)
Dim arrExtractedWord() = arrContacts(ctr).Split(" ")

   
For ctr as integer = 0 to arrExtractedWord.length - 1

if instr(0,SearchWord,arrExtractedWord(ctr).tostring)<> 0  then

console.writeline(arrExtractedWord(ctr).tostring)

next
Mr_Fulano

You can also use regular expression

Imports System
Imports System.Text.RegularExpressions

Public Class Test

    Public Shared Sub Main()

        ' Define a regular expression for repeated words.
        Dim rx As New Regex("jum\b*")

        ' Define a test string.        
        Dim text As String = "The the quick brown fox  fox jumped over the lazy dog dog."

        ' Find matches.
        Dim matches As MatchCollection = rx.Matches(text)

        ' Report the number of matches found.
        Console.WriteLine("{0} matches found.", matches.Count)

        ' Report on each match.
        For Each match As Match In matches
            Dim word As String = match.Groups("word").Value
            Dim index As Integer = match.Index
            Console.WriteLine("{0} repeated at position {1}", word, index)
            Console.WriteLine(match.Value.ToString())
        Next

    End Sub

End Class

vbturbo
Avatar of Mr_Fulano

ASKER

Hi Dhaest,

I really like your code in post [ID:19623047 Author:Dhaest Date:08.03.2007 at 03:03AM EDT]. It does exactly what I need.

I will select your answer, and in fact, have increased the points to 500 for all the work you did to assist me. However, could you please explain a little bit what the code is actually doing. Also, what is the "3" for? Is it because "dri" has 3 characters?

Also, thank you to all that contributed. I appreciate your help.

Thanks,
Fulano
Hi Dhaest, I may have spoken too soon...I tested your code a bit more and found a problem, which I attribute to the way I explained what I needed to do. Your code is finding the substring - anywhere - in any of the words in my TXT document and returning the substring + whatever follows that.

I forgot to explain that I need it to only find words that - begin - with the substring.  So, as an example, if we use "par" as our substring, I need to find,
part
party
parent
parental ... and so on.

But it wound need to skip words like:
subpart
apart
apartment
compartment ...etc.

Right now its returning things like:
part              which is a subset of subpart
partment      which is a subset of apartment
partment      which is a subset of compartment...etc.

Also, the words in the TXT document are listed one word per line, and one word after another, in case that makes a difference. Its just a long list of terms.

Thanks,
Fulano
Hi Dhaest, I figured out the problem...

I added a String variable called subString and replaced the "dri" with subString.

So, "While tempString.IndexOf(subString) > 0, should read "While tempString.IndexOf("dri") =  0"

That way it begins at index 0 of each word. Now, it works, but you can check it and make sure I didn't goof up some how.

I'd still like you to provide a brief synopsis of what the code is doing to help me learn. I don't understand all your steps and would like to learn more about what is actually happening in the code.

Thanks VERY much again,
Fulano

oops...typo
What I really meant was that it should read like this...

So, "While tempString.IndexOf(subString) > 0"

should read

"While tempString.IndexOf(subString) =  0"   << works on words that - begin -  with substring.
: )
 
Hi, I'm glad you figured it that last piece .. = 0 by yourself.
Glad I could help you with the greatest part of the solution.

>> Is it because "dri" has 3 characters
If I add 3 to my substring, the those 3 characters are gone, so the indexof won't find it anymore and will go to the next word in the same sentence
>>If I add 3 to my substring, the those 3 characters are gone, so the indexof won't find it anymore and will go to the next word in the same sentence.<<

OK, so if my subpart is 5 characters long, I need to change the 3 to a 5. Right?

Could you also briefly explain the code a little.

Thanks,
Fulano
ASKER CERTIFIED SOLUTION
Avatar of Dirk Haest
Dirk Haest
Flag of Belgium image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks Dhaest. Greatly appreciate the help!

FDT