skij
asked on
PHP/REGEX: Get Domain / Subdomain from URLs
Using PHP, how can I get the domain / subdomain(s) from a URL, without any prefixes or suffixes?
$urls = array('http://my.example.com','example.com','ftps://my.example.com','https://sub.domain.example.com/mypage?hello');
foreach ($urls as $url) {
echo $url; // I only want the domain / subdomain!
}
Try this
<?php
$urls = array('http://my.example.com','example.com','ftps://my.example.com','https://sub.domain.example.com/mypage?hello');
foreach ($urls as $url) {
$urlinfo = parse_url($url);
echo "<pre>" . print_r($urlinfo, true) . "</pre>";
echo "Domain : " . $urlinfo['host'];
}
ASKER
Julian, your idea does not work because some of the domains do not have a prefix.
Notice: Undefined index
REGIX might be needed.
Notice: Undefined index
REGIX might be needed.
Here's an example. You're right on the edge of what you can do with REGEX here, so take it with a grain of salt. In some contexts regular expressions are not all-powerful.
http://iconoun.com/demo/temp_skij.php
http://iconoun.com/demo/temp_skij.php
<?php // demo/temp_skij.php
/**
* http://www.experts-exchange.com/questions/28699566/PHP-REGEX-Get-Domain-Subdomain-from-URLs.html
* http://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html
*/
error_reporting(E_ALL);
echo '<pre>';
// A LIST OF STRINGS TO SEARCH
$urls = array
( 'http://my.example.com'
, 'example.com'
, 'ftps://my.example.com'
, 'https://sub.domain.example.com/mypage?hello'
)
;
// CREATE A REGULAR EXPRESSION TO SEARCH EACH STRING
$rgx
= '#' // REGEX DELIMITER
. '(' // START PROTOCOL CAPTURE GROUP
. 'https?|ftps?' // STRING LITERALS - OPTIONAL "S"
. ')' // END CAPTURE GROUP
. '?' // OPTIONAL PROTOCOL
. '(' // START CAPTURE GROUP
. '://' // STRING LITERAL
. ')' // END CAPTURE GROUP
. '?' // OPTIONAL
. '(' // START SUBDOMAIN/DOMAIN CAPTURE GROUP
. '.' // ANY CHARACTER
. '+?' // ONE OR MORE UNGREEDY
. ')' // END CAPTURE GROUP
. '(' // START TLD CAPTURE GROUP
. '\.' // ESCAPED DOT STRING LITERAL
. 'com|net|org' // STRING LITERALS
. ')' // END CAPTURE GROUP
. '#' // REGEX DELIMITER
. 'i' // MODIFIER FLAG - CASE-INSENSITIVE
;
foreach ($urls as $url)
{
echo PHP_EOL . "<b>$url</b>";
if (preg_match($rgx, $url, $match))
{
echo PHP_EOL . "FOUND REGEX: $rgx" . PHP_EOL;
var_dump($match);
// SHOW THE SUBDOMAIN AND DOMAIN WITH THE TLD
echo PHP_EOL . $match[3] . $match[4] . PHP_EOL;
}
}
Here's what I got from parse_url(). The only failure was the URL without a protocol. Of course, that's not really a URL!
http://my.example.com
array(2) {
["scheme"]=>
string(4) "http"
["host"]=>
string(14) "my.example.com"
}
example.com
array(1) {
["path"]=>
string(11) "example.com"
}
ftps://my.example.com
array(2) {
["scheme"]=>
string(4) "ftps"
["host"]=>
string(14) "my.example.com"
}
https://sub.domain.example.com/mypage?hello
array(4) {
["scheme"]=>
string(5) "https"
["host"]=>
string(22) "sub.domain.example.com"
["path"]=>
string(7) "/mypage"
["query"]=>
string(5) "hello"
}
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
https://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html