Link to home
Start Free TrialLog in
Avatar of aish13
aish13

asked on

PCRE for URL / Hostname match

Hi -

I am looking for a PCRE expression which can match something like this -

http://[InputExpression] OR https://[InputExpression]
http://[InputExpression]:80 OR https://[InputExpression]:443
http://[InputExpression]/ OR https://[InputExpression]/
http://[InputExpression]:80/ OR https://[InputExpression]:443/

[InputExpression] value could be either "myashish" or an IPAddress.

Basically PCRE should only match the above. If there is anything else in the URL e.g. http://mypgetrain/helloworld, http://mypgetrain/helloworld/cust etc (like a URI at the end) then it should not match. Only if the input value is one of the above URLs then it should create a match.

Any help in this regards will be really appreciated.

Regards
Ashish
Avatar of Adam314
Adam314

If you are using perl, you can use something other than / as the delimiter, making it easier to read.  This will work:
    m#^https?://[\w\.]+(:80|:443)?/?$#

If not, and you have to use / as the delimiter, then the / needs to be escaped in the RE:
    /^https?:\/\/[\w\.]+(:80|:443)?\/?$/
Avatar of Terry Woods
$pattern = "@https?://".preg_quote($inputExpression,"@").":(?:80|443)?/?@";

if (preg_match($pattern, $url)) {
  print "Great";
} else {
  print "Blurgh";
}
Seeing Adam314's suggestion, you'll want the start and end-of-line placemarkers too:

$pattern = "@^https?://".preg_quote($inputExpression,"@").":(?:80|443)?/?$@";
Apologies - there was a mistake in my suggestions. This should do it:

$pattern = "@^https?://".preg_quote($inputExpression,"@")."(?:\:(?:80|443))?/?$@";
Avatar of aish13

ASKER

Hi - Will it be possible for you create a PCRE that would have basically have input expression hardcoded as "mycompanytrain"...I am very new to PCRE and couldn't figure out how to do it.

Regards
Ashish

ASKER CERTIFIED SOLUTION
Avatar of Terry Woods
Terry Woods
Flag of New Zealand image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
preg_quote escapes any regular special characters that are valid characters in the URL, such as full stops. You'll need to use the preg_quote version if you want full stops in your input expression, unless you manually escape them.

eg if your input expression was google.co.uk, you could use either:

$inputExpression = "google.co.uk";  #just this changed... very easy!
$pattern = "@^https?://".preg_quote($inputExpression,"@")."(?:\:(?:80|443))?/?$@";

or the following, manually escaping the full stops (not the best way):
$pattern = "@^https?://google\.co\.uk(?:\:(?:80|443))?/?$@";
Typo: "regular special characters" was supposed to say "regular expression special characters"
Avatar of aish13

ASKER

Hi - Thanks a lot for the solution. The below URL pattern which you gave brought the following results -

"@^https?://mypgetrain(?:\:(?:80|443))?/?$@"

MATCHES
http://mycompanytrain
http://mycompanytrain/
http://mycompanytrain:80
http://mycompanytrain:80/
https://mycompanytrain
https://mycompanytrain/
https://mycompanytrain:443
https://mycompanytrain:443/

DID NOT MATCH
http://172.21.141.208
http://172.21.141.208/
http://172.21.141.208:80
http://172.21.141.208:80/
https://172.21.141.208
https://172.21.141.208/
https://172.21.141.208:443
https://172.21.141.208:443/

Can you please change the pattern so that it matches the IP address as well?

Regards
Ashish
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of aish13

ASKER

Thanks a lot for help...