Solved

REGEX Help for Domain Name + Path

Posted on 2014-03-07
5
336 Views
Last Modified: 2014-03-07
Hello,

I am sifting through a page of text and want to use preg_match on it to find urls that match a profile and capture them.

What is the regex I would use to find "http://domain.com/page/name" with:

"domain" could be "domain1" or "mydomain3"
"page/name" being any string with or without slashes, repeated any number of times
the "http" should also look for "https"
there might be a www or no www


Thank you very much!







Thank you!
0
Comment
Question by:EffinGood
  • 3
  • 2
5 Comments
 
LVL 34

Accepted Solution

by:
Dan Craciun earned 500 total points
ID: 39913324
Try
'%http[s]{0,1}://(www\.){0,1}(domain1|mydomain3)\.com(/\w+(\.\w*)*)*%'

Open in new window

HTH,
Dan
0
 

Author Closing Comment

by:EffinGood
ID: 39913382
Thanks Dan. I think I love you.
0
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 39913399
Glad I could help!

But I'm afraid I don't feel the same way... :)
0
 

Author Comment

by:EffinGood
ID: 39913652
That's ok. I understand. We can still be friends. :)
0
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 39914227
I've been asked on another forum to
1. stop using {0,1} and use the optional operator (?), the reason being that it's easier to read for the "properly" trained regexp specialists.
2. use non capturing groups (?:) when possible, to speed up matches a little (because the  regex engine does not need to keep track of groups).

So, you have below the functionally equivalent regex, but a little more "canonically" written:

'%https?://(?:www\.)?(?:domain1|mydomain3)\.com(?:/\w*(?:\.\w*)*)*%'

Open in new window

HTH,
Dan
0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

Suggested Solutions

Introduction HTML checkboxes provide the perfect way for a web developer to receive client input when the client's options might be none, one or many.  But the PHP code for processing the checkboxes can be confusing at first.  What if a checkbox is…
Nothing in an HTTP request can be trusted, including HTTP headers and form data.  A form token is a tool that can be used to guard against request forgeries (CSRF).  This article shows an improved approach to form tokens, making it more difficult to…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now