• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 219
  • Last Modified:

HELP!!! CRITICAL!!! I'm trying to extract a URL from a text file...then write it to a text file with just the URL.

Using system.io or something similar I need to read a text file.  This can be done unitl end of file or readline but I need to extract the URL in the file...Its in the form of "rcrast@http://ww3.yahoo.com/main/Prh13b/ps20060325-178782/index.html".

I need to extract this and then write it to another text file line by line...



The file also contains many different characters that should be excluded from the extraction and they look like this:  

À‰e ¿  @F€p Ê  @:Ÿ8 è  À.y^   tœ© ! €åÙí ^   [I Š   ]Á‹   €ßÿ  Ä  ÀULC        À=}  r €¶8» ‘  ¥… Ì  €#&4 ð  Àž9   '˜+ X   ?†\ K  l¡a ` @F1á p @hÛË  €Fm È  @’K‡  €p+·€ @¶J"€í  Ëuó€ €Úù– “  • q   ºì }  €¶çÊ   %ã @ö”« š € , ˆ €dè;€¨ @? T  €û”« ®  €£Î÷ Á   ÁCË ì  @z÷       ý
0
brian_leighty
Asked:
brian_leighty
  • 6
  • 4
1 Solution
 
deightonprogCommented:
will the url always have the string http:// in it?  The what will indicate the end of the url in the file?  Can you explain in general terms how the urls can be recognised?  Will the files have multiple urls in them, or just one?
0
 
brian_leightyAuthor Commented:
i just need to pull a bunch of websites out of a index file that internet explorer has made....its to log all the websites that my users goto...the http:// doesn't matter as long as I got all the web pages
0
 
deightonprogCommented:
I was thinking you could use the http to spot the urls when parsing the file, Is this is a  systems file, e.g. index.dat?   I've had problems manipulating those files in the past, they are special files it seems.
0
Cloud Class® Course: CompTIA Cloud+

The CompTIA Cloud+ Basic training course will teach you about cloud concepts and models, data storage, networking, and network infrastructure.

 
brian_leightyAuthor Commented:
no it's fine...what about the "rcrast@" to spot but HTTP is fine..
0
 
deightonprogCommented:


        Dim x As IO.File

        Try
            x.Delete("c:\temp.dat")
        Catch
        End Try
        x.Copy("c:\documents and settings\andyd\cookies\index.dat", "c:\temp.dat")



        Dim fs As New IO.StreamReader("c:\temp.dat")
        Dim fw As New IO.StreamWriter("c:\output.txt")

        Dim s As String = fs.ReadToEnd
        s = s.ToLower

        Dim bdone As Boolean

        Dim i As Integer = s.IndexOf("cookie:".ToLower)

        Do Until i < 0

            s = s.Substring(i)
            i = s.IndexOf("@")
            s = s.Substring(i + 1)

            bdone = False

            Dim surl As String = ""

            Dim j As Integer = 0
            Do Until bdone

                Dim sa As String = s.Substring(j, 1)
                If Asc(sa) = 0 Then
                    bdone = True
                Else
                    surl += sa
                    j += 1
                End If

            Loop

            fw.WriteLine(surl)



            i = s.IndexOf("cookie:".ToLower)
        Loop

        fs.Close()
        fw.Close()



0
 
deightonprogCommented:
if that doesn't work, then in the two place where i search for cookie:, search for rcrast
0
 
brian_leightyAuthor Commented:
that's so perfect but what about a checkbox or something to extract by the http:\\ and something to end the url
0
 
deightonprogCommented:
what do you mean by  'something to end the url'
0
 
brian_leightyAuthor Commented:
I dont think i mean anyhting by it because it will be the same as with "cookie:"

something like ".html"
you dont have to worry about that I need to try "http://
0
 
deightonprogCommented:
Dim x As IO.File

        Try
            x.Delete("c:\temp.dat")
        Catch
        End Try
        x.Copy("c:\documents and settings\andyd\cookies\index.dat", "c:\temp.dat")



        Dim fs As New IO.StreamReader("c:\temp.dat")
        Dim fw As New IO.StreamWriter("c:\output.txt")

        Dim s As String = fs.ReadToEnd
        s = s.ToLower

        Dim bdone As Boolean

        Dim i As Integer = s.IndexOf("http".ToLower)

        Do Until i < 0

            '            s = s.Substring(i)
            '           i = s.IndexOf("@")
            s = s.Substring(i)

            bdone = False

            Dim surl As String = ""

            Dim j As Integer = 0
            Do Until bdone

                Dim sa As String = s.Substring(j, 1)
                If Asc(sa) = 0 Then
                    bdone = True
                Else
                    surl += sa

                    j += 1
                End If

            Loop

            fw.WriteLine(surl)


            i = s.IndexOf("http".ToLower, 1)
        Loop

        fs.Close()
        fw.Close()




0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: MCSA MCSE Windows Server 2012

This course teaches how to install and configure Windows Server 2012 R2.  It is the first step on your path to becoming a Microsoft Certified Solutions Expert (MCSE).

  • 6
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now