[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 214
  • Last Modified:

HELP!!! CRITICAL!!! I'm trying to extract a URL from a text file...then write it to a text file with just the URL.

Using system.io or something similar I need to read a text file.  This can be done unitl end of file or readline but I need to extract the URL in the file...Its in the form of "rcrast@http://ww3.yahoo.com/main/Prh13b/ps20060325-178782/index.html".

I need to extract this and then write it to another text file line by line...



The file also contains many different characters that should be excluded from the extraction and they look like this:  

À‰e ¿  @F€p Ê  @:Ÿ8 è  À.y^   tœ© ! €åÙí ^   [I Š   ]Á‹   €ßÿ  Ä  ÀULC        À=}  r €¶8» ‘  ¥… Ì  €#&4 ð  Àž9   '˜+ X   ?†\ K  l¡a ` @F1á p @hÛË  €Fm È  @’K‡  €p+·€ @¶J"€í  Ëuó€ €Úù– “  • q   ºì }  €¶çÊ   %ã @ö”« š € , ˆ €dè;€¨ @? T  €û”« ®  €£Î÷ Á   ÁCË ì  @z÷       ý
0
brian_leighty
Asked:
brian_leighty
  • 6
  • 4
1 Solution
 
deightonCommented:
will the url always have the string http:// in it?  The what will indicate the end of the url in the file?  Can you explain in general terms how the urls can be recognised?  Will the files have multiple urls in them, or just one?
0
 
brian_leightyAuthor Commented:
i just need to pull a bunch of websites out of a index file that internet explorer has made....its to log all the websites that my users goto...the http:// doesn't matter as long as I got all the web pages
0
 
deightonCommented:
I was thinking you could use the http to spot the urls when parsing the file, Is this is a  systems file, e.g. index.dat?   I've had problems manipulating those files in the past, they are special files it seems.
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
brian_leightyAuthor Commented:
no it's fine...what about the "rcrast@" to spot but HTTP is fine..
0
 
deightonCommented:


        Dim x As IO.File

        Try
            x.Delete("c:\temp.dat")
        Catch
        End Try
        x.Copy("c:\documents and settings\andyd\cookies\index.dat", "c:\temp.dat")



        Dim fs As New IO.StreamReader("c:\temp.dat")
        Dim fw As New IO.StreamWriter("c:\output.txt")

        Dim s As String = fs.ReadToEnd
        s = s.ToLower

        Dim bdone As Boolean

        Dim i As Integer = s.IndexOf("cookie:".ToLower)

        Do Until i < 0

            s = s.Substring(i)
            i = s.IndexOf("@")
            s = s.Substring(i + 1)

            bdone = False

            Dim surl As String = ""

            Dim j As Integer = 0
            Do Until bdone

                Dim sa As String = s.Substring(j, 1)
                If Asc(sa) = 0 Then
                    bdone = True
                Else
                    surl += sa
                    j += 1
                End If

            Loop

            fw.WriteLine(surl)



            i = s.IndexOf("cookie:".ToLower)
        Loop

        fs.Close()
        fw.Close()



0
 
deightonCommented:
if that doesn't work, then in the two place where i search for cookie:, search for rcrast
0
 
brian_leightyAuthor Commented:
that's so perfect but what about a checkbox or something to extract by the http:\\ and something to end the url
0
 
deightonCommented:
what do you mean by  'something to end the url'
0
 
brian_leightyAuthor Commented:
I dont think i mean anyhting by it because it will be the same as with "cookie:"

something like ".html"
you dont have to worry about that I need to try "http://
0
 
deightonCommented:
Dim x As IO.File

        Try
            x.Delete("c:\temp.dat")
        Catch
        End Try
        x.Copy("c:\documents and settings\andyd\cookies\index.dat", "c:\temp.dat")



        Dim fs As New IO.StreamReader("c:\temp.dat")
        Dim fw As New IO.StreamWriter("c:\output.txt")

        Dim s As String = fs.ReadToEnd
        s = s.ToLower

        Dim bdone As Boolean

        Dim i As Integer = s.IndexOf("http".ToLower)

        Do Until i < 0

            '            s = s.Substring(i)
            '           i = s.IndexOf("@")
            s = s.Substring(i)

            bdone = False

            Dim surl As String = ""

            Dim j As Integer = 0
            Do Until bdone

                Dim sa As String = s.Substring(j, 1)
                If Asc(sa) = 0 Then
                    bdone = True
                Else
                    surl += sa

                    j += 1
                End If

            Loop

            fw.WriteLine(surl)


            i = s.IndexOf("http".ToLower, 1)
        Loop

        fs.Close()
        fw.Close()




0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 6
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now