Parsing (getting) URLs from text

Ok, so, I just want to know how to parse a string so that I can extract all the URLs (and only the URLs) from the string. Suppose that this is the text:
"The Guqin is the modern name for a plucked seven-string Chinese musical instrument of the zither family. For more info on it, check this page: http://en.wikipedia.org/wiki/Guqin."
I would want to know how to get just "http://en.wikipedia.org/wiki/Guqin" from that.
Obviously, there are a few things that will need to be considered... Not all the URLs will be separated by spaces or commas as sometimes the URL is followed by a period since it is at the end of a sentence.  Not all URLs will end in an extension, such as the one above. The URLs will vary in location and there will be no static formatting. All URLs will have http:// which should make things easier...

Thanks in advance.
codemaster3Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

g_johnsonCommented:
Can it be assumed that urls will be separated by ONE of these:  space, comma, period?

If so, I would search the string (using instr function) for "http://"  (P1)
then for a space (P2)
then for a comma (P3)
then for a period (P4)

then I would take the string between P1 and the lesser of P2,P3, and P4

Does that help?

P.S.  I'm fairly new to .Net so maybe INDEXOF instead of INSTR
0
g_johnsonCommented:
oh, and by the way,

"repeat if necessary" -- to find more urls in the same string
0
codemaster3Author Commented:
Yes, but there would be still a few things you'd have to check for. For example, what if this is the URL:
"http://www.site.com/something.somethingelse/file.rar. http://anothersite.com"
Then parsing by period's would only return "http://www" and "http://anothersite". In which case, parsing by spaces would return "http://www.site.com/something.somethingelse/file.rar." and "http://anothersite.com". The unecessary period at the end of the first URL would screw it up...
0
Cloud Class® Course: Python 3 Fundamentals

This course will teach participants about installing and configuring Python, syntax, importing, statements, types, strings, booleans, files, lists, tuples, comprehensions, functions, and classes.

codemaster3Author Commented:
Oh, and also a newline character could separate the URLs.
0
g_johnsonCommented:
yep, forgot about the period after www etc.
hmmm
not sure how to solve this
is the string ALL urls or is there other information in it?
0
codemaster3Author Commented:
Other info... Basically it's suppose to let you paste a big block of text with descriptions of each file as well, but eliminating the need to filter out the extra info...
0
g_johnsonCommented:
I honestly don't know what to do.  The fact that a period can be the separator messes the whole thing up.  I might resort to needing to "test" strings after I've parsed them out, i.e., by parsing "http://www.yahoo.com.  And then another sentence." I get:

http://www then
http://www.yahoo then
http://www.yahoo.com and finally
http://www.yahoo.com And then another sentence

the first three would return a valid web address in a browser, the last one wouldn't, so #3 is my url.

Check this link to test for valid urls:
http://vbnet.mvps.org/index.html?code/fileapi/pathisurl.htm

let me know if that helps
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
codemaster3Author Commented:
Yea, I think I might just do this... I had imagined that I would have to resort to something like this, although I had wondered if there was an easier way, something done specifically for this. Thanks for the help anyways :)
0
g_johnsonCommented:
hey, we tried, right?!   LOL
0
codemaster3Author Commented:
Hehe, yea, thanks.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Visual Basic.NET

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.