REGEX Help for Domain Name + Path

Hello,

I am sifting through a page of text and want to use preg_match on it to find urls that match a profile and capture them.

What is the regex I would use to find "http://domain.com/page/name" with:

"domain" could be "domain1" or "mydomain3"
"page/name" being any string with or without slashes, repeated any number of times
the "http" should also look for "https"
there might be a www or no www


Thank you very much!







Thank you!
EffinGoodAsked:
Who is Participating?
 
Dan CraciunConnect With a Mentor IT ConsultantCommented:
Try
'%http[s]{0,1}://(www\.){0,1}(domain1|mydomain3)\.com(/\w+(\.\w*)*)*%'

Open in new window

HTH,
Dan
0
 
EffinGoodAuthor Commented:
Thanks Dan. I think I love you.
0
 
Dan CraciunIT ConsultantCommented:
Glad I could help!

But I'm afraid I don't feel the same way... :)
0
 
EffinGoodAuthor Commented:
That's ok. I understand. We can still be friends. :)
0
 
Dan CraciunIT ConsultantCommented:
I've been asked on another forum to
1. stop using {0,1} and use the optional operator (?), the reason being that it's easier to read for the "properly" trained regexp specialists.
2. use non capturing groups (?:) when possible, to speed up matches a little (because the  regex engine does not need to keep track of groups).

So, you have below the functionally equivalent regex, but a little more "canonically" written:

'%https?://(?:www\.)?(?:domain1|mydomain3)\.com(?:/\w*(?:\.\w*)*)*%'

Open in new window

HTH,
Dan
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.