Link to home
Start Free TrialLog in
Avatar of skij
skijFlag for Canada

asked on

PHP/REGEX: Get Domain / Subdomain from URLs

Using PHP, how can I get the domain / subdomain(s) from a URL, without any prefixes or suffixes?
$urls = array('http://my.example.com','example.com','ftps://my.example.com','https://sub.domain.example.com/mypage?hello');
foreach ($urls as $url) {
 echo $url; // I only want the domain / subdomain!
}

Open in new window

Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

See if this article helps.  I used a similar requirement as the basis for the discussion of test-driven development.
https://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html
Avatar of Julian Hansen
Try this
<?php
$urls = array('http://my.example.com','example.com','ftps://my.example.com','https://sub.domain.example.com/mypage?hello');
foreach ($urls as $url) {
  $urlinfo = parse_url($url);
  echo "<pre>" . print_r($urlinfo, true) . "</pre>";
  echo "Domain : " . $urlinfo['host'];
}

Open in new window

Avatar of skij

ASKER

Julian, your idea does not work because some of the domains do not have a prefix.
Notice: Undefined index

REGIX might be needed.
Here's an example.  You're right on the edge of what you can do with REGEX here, so take it with a grain of salt.  In some contexts regular expressions are not all-powerful.
http://iconoun.com/demo/temp_skij.php
<?php // demo/temp_skij.php

/**
 * http://www.experts-exchange.com/questions/28699566/PHP-REGEX-Get-Domain-Subdomain-from-URLs.html
 * http://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html
 */
error_reporting(E_ALL);
echo '<pre>';


// A LIST OF STRINGS TO SEARCH
$urls = array
( 'http://my.example.com'
, 'example.com'
, 'ftps://my.example.com'
, 'https://sub.domain.example.com/mypage?hello'
)
;


// CREATE A REGULAR EXPRESSION TO SEARCH EACH STRING
$rgx
= '#'              // REGEX DELIMITER

. '('              // START PROTOCOL CAPTURE GROUP
. 'https?|ftps?'   // STRING LITERALS - OPTIONAL "S"
. ')'              // END CAPTURE GROUP
. '?'              // OPTIONAL PROTOCOL

. '('              // START CAPTURE GROUP
. '://'            // STRING LITERAL
. ')'              // END CAPTURE GROUP
. '?'              // OPTIONAL

. '('              // START SUBDOMAIN/DOMAIN CAPTURE GROUP
. '.'              // ANY CHARACTER
. '+?'             // ONE OR MORE UNGREEDY
. ')'              // END CAPTURE GROUP

. '('              // START TLD CAPTURE GROUP
. '\.'             // ESCAPED DOT STRING LITERAL
. 'com|net|org'    // STRING LITERALS
. ')'              // END CAPTURE GROUP

. '#'              // REGEX DELIMITER
. 'i'              // MODIFIER FLAG - CASE-INSENSITIVE
;

foreach ($urls as $url)
{
    echo PHP_EOL . "<b>$url</b>";
    if (preg_match($rgx, $url, $match))
    {
        echo PHP_EOL . "FOUND REGEX: $rgx" . PHP_EOL;
        var_dump($match);

        // SHOW THE SUBDOMAIN AND DOMAIN WITH THE TLD
        echo PHP_EOL . $match[3] . $match[4] . PHP_EOL;
    }
}

Open in new window

Here's what I got from parse_url().  The only failure was the URL without a protocol.  Of course, that's not really a URL!
http://my.example.com
array(2) {
  ["scheme"]=>
  string(4) "http"
  ["host"]=>
  string(14) "my.example.com"
}

example.com
array(1) {
  ["path"]=>
  string(11) "example.com"
}

ftps://my.example.com
array(2) {
  ["scheme"]=>
  string(4) "ftps"
  ["host"]=>
  string(14) "my.example.com"
}

https://sub.domain.example.com/mypage?hello
array(4) {
  ["scheme"]=>
  string(5) "https"
  ["host"]=>
  string(22) "sub.domain.example.com"
  ["path"]=>
  string(7) "/mypage"
  ["query"]=>
  string(5) "hello"
}

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of Julian Hansen
Julian Hansen
Flag of South Africa image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial