?
Solved

Parse a URL for the New URL

Posted on 2003-02-20
16
Medium Priority
?
199 Views
Last Modified: 2012-05-04
Anyone have an idea how to parse a URL for any URL's that maybe a redirect to?

http://www.some.com/redir.cgi?tranfer=www.another.com/index.htm
http://www.some.com/redir?URL=http://www.another.com/index.htm

What I like to do is be able to examine URL's like above and grab the new URL's:

http://www.another.com/index.htm
www.another.com/index.htm

0
Comment
Question by:James
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 3
  • 3
  • +4
16 Comments
 
LVL 2

Expert Comment

by:NPluis
ID: 7986402
how about:
Mid$(s, InStr(InStr(1, s, "redir"), s, "=") + 1)
where s is the hyperlink as string
0
 
LVL 11

Expert Comment

by:rdrunner
ID: 7986433
Hmmm...


You could try to use a regexp to get the URLs out of the strings.

Let me check what would match all redirs

I think this should work as matchstring :


(((\:i).)+(\:c)+)
0
 
LVL 11

Expert Comment

by:rdrunner
ID: 7986571
Revised Version here :

Input to test : you question

Code:

'Snipp
Private Sub Command2_Click()

Dim oRegExp As New RegExp
Dim oMatches As MatchCollection
Dim oMatch As Match
Dim oInput As TextStream
Dim cLine As String

oRegExp.IgnoreCase = True
oRegExp.Global = True
oRegExp.Pattern = "(([a-zA-Z0-9-]+[\./]+)+[a-zA-Z0-9-]+)"

Set oInput = ofso.OpenTextFile("c:\work\test.txt")
While Not oInput.AtEndOfStream
    cLine = oInput.ReadLine
    Set oMatches = oRegExp.Execute(cLine)
    Debug.Print oMatches.Count
    For Each oMatch In oMatches
        Debug.Print oMatch
    Next
Wend

End Sub

'snapp



'Debug output:

www.some.com/redir.cgi
www.another.com/index.htm
 2
www.some.com/redir
www.another.com/index.htm
 0
 0
 0
 1
www.another.com/index.htm
 1
www.another.com/index.htm
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 1

Expert Comment

by:simon_thwaites
ID: 7987161
Well querystrings are seperated by ? [questionmark] and & [ampersand]. If you first split the URL by the ? you will be left with the script and the querystrings.

ie
http://www.domain.com/index.asp?id=2&url=www.test.com&test=3

you would first end up with
1: http://www.domain.com
2: id=2&url=www.test.com&test=3


if you split the querystrings with an & you will have a list of the seperate querystrings

you will then get
1: id=2
2: url=www.test.com
3: test=3

If you know what the property is going to be, its easy to pick out. if you dont, you will have to check for them manually, and take into account the different ways a url can be presented...
www.*.*
http://www.*.*
http://*.*
https://www.*.*
https://*.*

And there are loads more different variations, "/newfolder/newfile.htm" could appear, where you would need to know the directory it is currently in (can be picked up from the main url, usually)
0
 
LVL 11

Expert Comment

by:rdrunner
ID: 7987204
Thats why i suggested regular expressions....

This is what i tossed together quite quickly and it will match almost all urls...

"(([a-zA-Z0-9-]+[\./]+)+[a-zA-Z0-9-]+)"
To break this up:

"[a-zA-Z0-9-]+" ---> any "word"

[\./]+  ---> followed by a . or a /

([a-zA-Z0-9-]+[\./]+)+ ---> More then once...

[a-zA-Z0-9-]+   ---> end an ending after it...

It will "match" almost all URLs




0
 
LVL 3

Expert Comment

by:Gunsen
ID: 7987825
0
 
LVL 3

Expert Comment

by:Gunsen
ID: 7987862
Try this:  
Option Compare Text
sURL = "http://www.some.com/redir?URL=xhttp://www.another.com/index.htm"
p = InStr(sURL, "?")
If p > 0 Then ' Parameter
  newURL = Mid$(sURL, p + 1)
  p = InStr(sURL, "=")
  If p > 0 Then ' Assigned to
    newURL = Mid$(sURL, p + 1)
    If (newURL Like "http:*") Or (newURL Like "www.*") Then
      MsgBox "NewURL=" & newURL
    End If
  End If
End If
0
 
LVL 3

Expert Comment

by:Gunsen
ID: 7987872
Sorry about the 'x' (just a test)
sURL = "http://www.some.com/redir?URL=http://www.another.com/index.htm"
0
 
LVL 11

Expert Comment

by:rdrunner
ID: 7988319
Not all sites start with www or http://

Thats why you have to use a structural search and not a litteral one ....


For example:

daoc.4-players.de


0
 

Author Comment

by:James
ID: 7991026
Rdrunner and Gunsen codes seem fine if they didn't miss redirdicts like:

http://msid.msn.com/mps_id_sharing/redirect.asp?www.msnbc.com/news/default.asp


You know, I bet almost anything there is a API to do this since IE does it all the time with URL's that have moved or being redirected. I've seen API to crack URL's, so I suspect there is one!

0
 
LVL 11

Accepted Solution

by:
rdrunner earned 400 total points
ID: 7996398
I am not missing that redirect, am I ?

i should only miss the 1st URL since it contains a "_" which i was not checking for ;) but i am able to get the redirect faterwards...

To "fix" that you only need to change one line :

RegExp.Pattern = "(([a-zA-Z0-9-]+[\./]+)+[a-zA-Z0-9-]+)"

to

RegExp.Pattern = "(([a-zA-Z0-9_-]+[\./]+)+[a-zA-Z0-9-]+)"

Now it will also catch URLs with _ in it...

Lemme try again ;)

Ok it cought the URL BEHIND the ? and that was asked ;)


Quote:
------
What I like to do is be able to examine URL's like above and grab the new URL's:

http://www.another.com/index.htm
www.another.com/index.htm
------


you need to give more acurate problem descriptions ;)

Here is what it cought after the fix from your last post :

 2
msid.msn.com/mps_id_sharing/redirect.asp
www.msnbc.com/news/default.asp
0
 
LVL 5

Expert Comment

by:Rhaedes
ID: 8004303
It strikes me that you could just send a webbrowser (or similar) to the URL, wait for it to settle (readystate) and then read off the final URL. This would cover all the above cases, and also catch script redirects and meta tag refreshes.
Kindest regards,
Rhaedes
0
 

Expert Comment

by:CleanupPing
ID: 8901391
ohmeohmy:
This old question needs to be finalized -- accept an answer, split points, or get a refund.  For information on your options, please click here-> http:/help/closing.jsp#1 
Experts: Post your closing recommendations!  Who deserves points here?
0
 
LVL 1

Expert Comment

by:simon_thwaites
ID: 8901709
ohmeohmy,

Why would you need an API when it is very easy to parse yourself?
0
 

Author Comment

by:James
ID: 8902428
Woops........forgot this............

Simon, what makes you think I needed API? I am the last one in the world to ever know what I need....




0
 
LVL 1

Expert Comment

by:simon_thwaites
ID: 8902635
Because you said...

"You know, I bet almost anything there is a API to do this since IE does it all the time with URL's that have moved or being redirected. I've seen API to crack URL's, so I suspect there is one!"
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Most everyone who has done any programming in VB6 knows that you can do something in code like Debug.Print MyVar and that when the program runs from the IDE, the value of MyVar will be displayed in the Immediate Window. Less well known is Debug.Asse…
You can of course define an array to hold data that is of a particular type like an array of Strings to hold customer names or an array of Doubles to hold customer sales, but what do you do if you want to coordinate that data? This article describes…
Get people started with the process of using Access VBA to control Excel using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Excel. Using automation, an Access application can laun…
Get people started with the utilization of class modules. Class modules can be a powerful tool in Microsoft Access. They allow you to create self-contained objects that encapsulate functionality. They can easily hide the complexity of a process from…
Suggested Courses

765 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question