PHP/REGEX: Get Domain / Subdomain from URLs

Using PHP, how can I get the domain / subdomain(s) from a URL, without any prefixes or suffixes?
$urls = array('http://my.example.com','example.com','ftps://my.example.com','https://sub.domain.example.com/mypage?hello');
foreach ($urls as $url) {
 echo $url; // I only want the domain / subdomain!
}

Open in new window

LVL 10
skijAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Ray PaseurCommented:
See if this article helps.  I used a similar requirement as the basis for the discussion of test-driven development.
http://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html
Julian HansenCommented:
Try this
<?php
$urls = array('http://my.example.com','example.com','ftps://my.example.com','https://sub.domain.example.com/mypage?hello');
foreach ($urls as $url) {
  $urlinfo = parse_url($url);
  echo "<pre>" . print_r($urlinfo, true) . "</pre>";
  echo "Domain : " . $urlinfo['host'];
}

Open in new window

skijAuthor Commented:
Julian, your idea does not work because some of the domains do not have a prefix.
Notice: Undefined index

REGIX might be needed.
C++ 11 Fundamentals

This course will introduce you to C++ 11 and teach you about syntax fundamentals.

Ray PaseurCommented:
Here's an example.  You're right on the edge of what you can do with REGEX here, so take it with a grain of salt.  In some contexts regular expressions are not all-powerful.
http://iconoun.com/demo/temp_skij.php
<?php // demo/temp_skij.php

/**
 * http://www.experts-exchange.com/questions/28699566/PHP-REGEX-Get-Domain-Subdomain-from-URLs.html
 * http://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html
 */
error_reporting(E_ALL);
echo '<pre>';


// A LIST OF STRINGS TO SEARCH
$urls = array
( 'http://my.example.com'
, 'example.com'
, 'ftps://my.example.com'
, 'https://sub.domain.example.com/mypage?hello'
)
;


// CREATE A REGULAR EXPRESSION TO SEARCH EACH STRING
$rgx
= '#'              // REGEX DELIMITER

. '('              // START PROTOCOL CAPTURE GROUP
. 'https?|ftps?'   // STRING LITERALS - OPTIONAL "S"
. ')'              // END CAPTURE GROUP
. '?'              // OPTIONAL PROTOCOL

. '('              // START CAPTURE GROUP
. '://'            // STRING LITERAL
. ')'              // END CAPTURE GROUP
. '?'              // OPTIONAL

. '('              // START SUBDOMAIN/DOMAIN CAPTURE GROUP
. '.'              // ANY CHARACTER
. '+?'             // ONE OR MORE UNGREEDY
. ')'              // END CAPTURE GROUP

. '('              // START TLD CAPTURE GROUP
. '\.'             // ESCAPED DOT STRING LITERAL
. 'com|net|org'    // STRING LITERALS
. ')'              // END CAPTURE GROUP

. '#'              // REGEX DELIMITER
. 'i'              // MODIFIER FLAG - CASE-INSENSITIVE
;

foreach ($urls as $url)
{
    echo PHP_EOL . "<b>$url</b>";
    if (preg_match($rgx, $url, $match))
    {
        echo PHP_EOL . "FOUND REGEX: $rgx" . PHP_EOL;
        var_dump($match);

        // SHOW THE SUBDOMAIN AND DOMAIN WITH THE TLD
        echo PHP_EOL . $match[3] . $match[4] . PHP_EOL;
    }
}

Open in new window

Ray PaseurCommented:
Here's what I got from parse_url().  The only failure was the URL without a protocol.  Of course, that's not really a URL!
http://my.example.com
array(2) {
  ["scheme"]=>
  string(4) "http"
  ["host"]=>
  string(14) "my.example.com"
}

example.com
array(1) {
  ["path"]=>
  string(11) "example.com"
}

ftps://my.example.com
array(2) {
  ["scheme"]=>
  string(4) "ftps"
  ["host"]=>
  string(14) "my.example.com"
}

https://sub.domain.example.com/mypage?hello
array(4) {
  ["scheme"]=>
  string(5) "https"
  ["host"]=>
  string(22) "sub.domain.example.com"
  ["path"]=>
  string(7) "/mypage"
  ["query"]=>
  string(5) "hello"
}

Open in new window

Julian HansenCommented:
Regex will work but there is more than one way to flay the cat

Firstly, if your array contains URL's and parse_url does not return a hostname then simply assume that the url is in its hostname form and use it as is.
Failing that prepend an http onto it and then call parse_url again like so
$urls = array('http://my.example.com','example.com/somefile.html','ftps://my.example.com','https://sub.domain.example.com/mypage?hello');
foreach ($urls as $url) {
  $urlinfo = parse_url($url);
  if (!isset($urlinfo['host'])) {
    $urlinfo = parse_url('http://' . $url);
    if (!isset($urlinfo['host'])) {
      echo "Bad URL";
    }
  }
  echo "<pre>" . print_r($urlinfo, true) . "</pre>";
  echo "Domain : " . $urlinfo['host'];
}

Open in new window

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
PHP

From novice to tech pro — start learning today.