brian_leighty
asked on
HELP!!! CRITICAL!!! I'm trying to extract a URL from a text file...then write it to a text file with just the URL.
Using system.io or something similar I need to read a text file. This can be done unitl end of file or readline but I need to extract the URL in the file...Its in the form of "rcrast@http://ww3.yahoo.com/main/Prh13b/ps20060325-178782/index.html".
I need to extract this and then write it to another text file line by line...
The file also contains many different characters that should be excluded from the extraction and they look like this:
À‰e ¿ @F€p Ê @:Ÿ8 è À.y^ tœ© ! €åÙí ^ [I Š ]Á‹ €ßÿ Ä ÀULC À=} r €¶8» ‘ ¥… Ì €#&4 ð Àž9 '˜+ X ?†\ K l¡a ` @F1á p @hÛË €Fm È @’K‡ €p+·€ @¶J"€í Ëuó€ €Úù– “ • q ºì } €¶çÊ %ã @ö”« š € , ˆ €dè;€¨ @? T €û”« ® €£Î÷ Á ÁCË ì @z÷ ý
I need to extract this and then write it to another text file line by line...
The file also contains many different characters that should be excluded from the extraction and they look like this:
À‰e ¿ @F€p Ê @:Ÿ8 è À.y^ tœ© ! €åÙí ^ [I Š ]Á‹ €ßÿ Ä ÀULC À=} r €¶8» ‘ ¥… Ì €#&4 ð Àž9 '˜+ X ?†\ K l¡a ` @F1á p @hÛË €Fm È @’K‡ €p+·€ @¶J"€í Ëuó€ €Úù– “ • q ºì } €¶çÊ %ã @ö”« š € , ˆ €dè;€¨ @? T €û”« ® €£Î÷ Á ÁCË ì @z÷ ý
will the url always have the string http:// in it? The what will indicate the end of the url in the file? Can you explain in general terms how the urls can be recognised? Will the files have multiple urls in them, or just one?
ASKER
i just need to pull a bunch of websites out of a index file that internet explorer has made....its to log all the websites that my users goto...the http:// doesn't matter as long as I got all the web pages
I was thinking you could use the http to spot the urls when parsing the file, Is this is a systems file, e.g. index.dat? I've had problems manipulating those files in the past, they are special files it seems.
ASKER
no it's fine...what about the "rcrast@" to spot but HTTP is fine..
Dim x As IO.File
Try
x.Delete("c:\temp.dat")
Catch
End Try
x.Copy("c:\documents and settings\andyd\cookies\ind
Dim fs As New IO.StreamReader("c:\temp.d
Dim fw As New IO.StreamWriter("c:\output
Dim s As String = fs.ReadToEnd
s = s.ToLower
Dim bdone As Boolean
Dim i As Integer = s.IndexOf("cookie:".ToLowe
Do Until i < 0
s = s.Substring(i)
i = s.IndexOf("@")
s = s.Substring(i + 1)
bdone = False
Dim surl As String = ""
Dim j As Integer = 0
Do Until bdone
Dim sa As String = s.Substring(j, 1)
If Asc(sa) = 0 Then
bdone = True
Else
surl += sa
j += 1
End If
Loop
fw.WriteLine(surl)
i = s.IndexOf("cookie:".ToLowe
Loop
fs.Close()
fw.Close()
if that doesn't work, then in the two place where i search for cookie:, search for rcrast
ASKER
that's so perfect but what about a checkbox or something to extract by the http:\\ and something to end the url
what do you mean by 'something to end the url'
ASKER
I dont think i mean anyhting by it because it will be the same as with "cookie:"
something like ".html"
you dont have to worry about that I need to try "http://
something like ".html"
you dont have to worry about that I need to try "http://
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.