asked on

PHP/REGEX: Get Domain / Subdomain from URLs

Using PHP, how can I get the domain / subdomain(s) from a URL, without any prefixes or suffixes?

$urls = array('http://my.example.com','example.com','ftps://my.example.com','https://sub.domain.example.com/mypage?hello');
foreach ($urls as $url) {
 echo $url; // I only want the domain / subdomain!
}

Open in new window

Ray Paseur

See if this article helps. I used a similar requirement as the basis for the discussion of test-driven development.
https://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html

Julian Hansen

Try this

<?php
$urls = array('http://my.example.com','example.com','ftps://my.example.com','https://sub.domain.example.com/mypage?hello');
foreach ($urls as $url) {
  $urlinfo = parse_url($url);
  echo "<pre>" . print_r($urlinfo, true) . "</pre>";
  echo "Domain : " . $urlinfo['host'];
}

Open in new window

skij

ASKER

Julian, your idea does not work because some of the domains do not have a prefix.
Notice: Undefined index

REGIX might be needed.

Ray Paseur

Here's an example. You're right on the edge of what you can do with REGEX here, so take it with a grain of salt. In some contexts regular expressions are not all-powerful.
http://iconoun.com/demo/temp_skij.php

<?php // demo/temp_skij.php

/**
 * http://www.experts-exchange.com/questions/28699566/PHP-REGEX-Get-Domain-Subdomain-from-URLs.html
 * http://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html
 */
error_reporting(E_ALL);
echo '<pre>';


// A LIST OF STRINGS TO SEARCH
$urls = array
( 'http://my.example.com'
, 'example.com'
, 'ftps://my.example.com'
, 'https://sub.domain.example.com/mypage?hello'
)
;


// CREATE A REGULAR EXPRESSION TO SEARCH EACH STRING
$rgx
= '#'              // REGEX DELIMITER

. '('              // START PROTOCOL CAPTURE GROUP
. 'https?|ftps?'   // STRING LITERALS - OPTIONAL "S"
. ')'              // END CAPTURE GROUP
. '?'              // OPTIONAL PROTOCOL

. '('              // START CAPTURE GROUP
. '://'            // STRING LITERAL
. ')'              // END CAPTURE GROUP
. '?'              // OPTIONAL

. '('              // START SUBDOMAIN/DOMAIN CAPTURE GROUP
. '.'              // ANY CHARACTER
. '+?'             // ONE OR MORE UNGREEDY
. ')'              // END CAPTURE GROUP

. '('              // START TLD CAPTURE GROUP
. '\.'             // ESCAPED DOT STRING LITERAL
. 'com|net|org'    // STRING LITERALS
. ')'              // END CAPTURE GROUP

. '#'              // REGEX DELIMITER
. 'i'              // MODIFIER FLAG - CASE-INSENSITIVE
;

foreach ($urls as $url)
{
    echo PHP_EOL . "<b>$url</b>";
    if (preg_match($rgx, $url, $match))
    {
        echo PHP_EOL . "FOUND REGEX: $rgx" . PHP_EOL;
        var_dump($match);

        // SHOW THE SUBDOMAIN AND DOMAIN WITH THE TLD
        echo PHP_EOL . $match[3] . $match[4] . PHP_EOL;
    }
}

Open in new window

Ray Paseur

Here's what I got from parse_url(). The only failure was the URL without a protocol. Of course, that's not really a URL!

http://my.example.com
array(2) {
  ["scheme"]=>
  string(4) "http"
  ["host"]=>
  string(14) "my.example.com"
}

example.com
array(1) {
  ["path"]=>
  string(11) "example.com"
}

ftps://my.example.com
array(2) {
  ["scheme"]=>
  string(4) "ftps"
  ["host"]=>
  string(14) "my.example.com"
}

https://sub.domain.example.com/mypage?hello
array(4) {
  ["scheme"]=>
  string(5) "https"
  ["host"]=>
  string(22) "sub.domain.example.com"
  ["path"]=>
  string(7) "/mypage"
  ["query"]=>
  string(5) "hello"
}

Open in new window

ASKER CERTIFIED SOLUTION

Julian Hansen

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial