• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 208
  • Last Modified:

Parse a URL for the New URL

Anyone have an idea how to parse a URL for any URL's that maybe a redirect to?

http://www.some.com/redir.cgi?tranfer=www.another.com/index.htm
http://www.some.com/redir?URL=http://www.another.com/index.htm

What I like to do is be able to examine URL's like above and grab the new URL's:

http://www.another.com/index.htm
www.another.com/index.htm

0
James
Asked:
James
  • 5
  • 3
  • 3
  • +4
1 Solution
 
NPluisCommented:
how about:
Mid$(s, InStr(InStr(1, s, "redir"), s, "=") + 1)
where s is the hyperlink as string
0
 
rdrunnerCommented:
Hmmm...


You could try to use a regexp to get the URLs out of the strings.

Let me check what would match all redirs

I think this should work as matchstring :


(((\:i).)+(\:c)+)
0
 
rdrunnerCommented:
Revised Version here :

Input to test : you question

Code:

'Snipp
Private Sub Command2_Click()

Dim oRegExp As New RegExp
Dim oMatches As MatchCollection
Dim oMatch As Match
Dim oInput As TextStream
Dim cLine As String

oRegExp.IgnoreCase = True
oRegExp.Global = True
oRegExp.Pattern = "(([a-zA-Z0-9-]+[\./]+)+[a-zA-Z0-9-]+)"

Set oInput = ofso.OpenTextFile("c:\work\test.txt")
While Not oInput.AtEndOfStream
    cLine = oInput.ReadLine
    Set oMatches = oRegExp.Execute(cLine)
    Debug.Print oMatches.Count
    For Each oMatch In oMatches
        Debug.Print oMatch
    Next
Wend

End Sub

'snapp



'Debug output:

www.some.com/redir.cgi
www.another.com/index.htm
 2
www.some.com/redir
www.another.com/index.htm
 0
 0
 0
 1
www.another.com/index.htm
 1
www.another.com/index.htm
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
simon_thwaitesCommented:
Well querystrings are seperated by ? [questionmark] and & [ampersand]. If you first split the URL by the ? you will be left with the script and the querystrings.

ie
http://www.domain.com/index.asp?id=2&url=www.test.com&test=3

you would first end up with
1: http://www.domain.com
2: id=2&url=www.test.com&test=3


if you split the querystrings with an & you will have a list of the seperate querystrings

you will then get
1: id=2
2: url=www.test.com
3: test=3

If you know what the property is going to be, its easy to pick out. if you dont, you will have to check for them manually, and take into account the different ways a url can be presented...
www.*.*
http://www.*.*
http://*.*
https://www.*.*
https://*.*

And there are loads more different variations, "/newfolder/newfile.htm" could appear, where you would need to know the directory it is currently in (can be picked up from the main url, usually)
0
 
rdrunnerCommented:
Thats why i suggested regular expressions....

This is what i tossed together quite quickly and it will match almost all urls...

"(([a-zA-Z0-9-]+[\./]+)+[a-zA-Z0-9-]+)"
To break this up:

"[a-zA-Z0-9-]+" ---> any "word"

[\./]+  ---> followed by a . or a /

([a-zA-Z0-9-]+[\./]+)+ ---> More then once...

[a-zA-Z0-9-]+   ---> end an ending after it...

It will "match" almost all URLs




0
 
GunsenCommented:
Try this:  
Option Compare Text
sURL = "http://www.some.com/redir?URL=xhttp://www.another.com/index.htm"
p = InStr(sURL, "?")
If p > 0 Then ' Parameter
  newURL = Mid$(sURL, p + 1)
  p = InStr(sURL, "=")
  If p > 0 Then ' Assigned to
    newURL = Mid$(sURL, p + 1)
    If (newURL Like "http:*") Or (newURL Like "www.*") Then
      MsgBox "NewURL=" & newURL
    End If
  End If
End If
0
 
GunsenCommented:
Sorry about the 'x' (just a test)
sURL = "http://www.some.com/redir?URL=http://www.another.com/index.htm"
0
 
rdrunnerCommented:
Not all sites start with www or http://

Thats why you have to use a structural search and not a litteral one ....


For example:

daoc.4-players.de


0
 
JamesAuthor Commented:
Rdrunner and Gunsen codes seem fine if they didn't miss redirdicts like:

http://msid.msn.com/mps_id_sharing/redirect.asp?www.msnbc.com/news/default.asp


You know, I bet almost anything there is a API to do this since IE does it all the time with URL's that have moved or being redirected. I've seen API to crack URL's, so I suspect there is one!

0
 
rdrunnerCommented:
I am not missing that redirect, am I ?

i should only miss the 1st URL since it contains a "_" which i was not checking for ;) but i am able to get the redirect faterwards...

To "fix" that you only need to change one line :

RegExp.Pattern = "(([a-zA-Z0-9-]+[\./]+)+[a-zA-Z0-9-]+)"

to

RegExp.Pattern = "(([a-zA-Z0-9_-]+[\./]+)+[a-zA-Z0-9-]+)"

Now it will also catch URLs with _ in it...

Lemme try again ;)

Ok it cought the URL BEHIND the ? and that was asked ;)


Quote:
------
What I like to do is be able to examine URL's like above and grab the new URL's:

http://www.another.com/index.htm
www.another.com/index.htm
------


you need to give more acurate problem descriptions ;)

Here is what it cought after the fix from your last post :

 2
msid.msn.com/mps_id_sharing/redirect.asp
www.msnbc.com/news/default.asp
0
 
RhaedesCommented:
It strikes me that you could just send a webbrowser (or similar) to the URL, wait for it to settle (readystate) and then read off the final URL. This would cover all the above cases, and also catch script redirects and meta tag refreshes.
Kindest regards,
Rhaedes
0
 
CleanupPingCommented:
ohmeohmy:
This old question needs to be finalized -- accept an answer, split points, or get a refund.  For information on your options, please click here-> http:/help/closing.jsp#1 
Experts: Post your closing recommendations!  Who deserves points here?
0
 
simon_thwaitesCommented:
ohmeohmy,

Why would you need an API when it is very easy to parse yourself?
0
 
JamesAuthor Commented:
Woops........forgot this............

Simon, what makes you think I needed API? I am the last one in the world to ever know what I need....




0
 
simon_thwaitesCommented:
Because you said...

"You know, I bet almost anything there is a API to do this since IE does it all the time with URL's that have moved or being redirected. I've seen API to crack URL's, so I suspect there is one!"
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 5
  • 3
  • 3
  • +4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now