PHP/REGEX: Get Domain / Subdomain from URLs

Using PHP, how can I get the domain / subdomain(s) from a URL, without any prefixes or suffixes?
$urls = array('http://my.example.com','example.com','ftps://my.example.com','https://sub.domain.example.com/mypage?hello');
foreach ($urls as $url) {
 echo $url; // I only want the domain / subdomain!
}

Open in new window

LVL 10
skijAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Ray PaseurCommented:
See if this article helps.  I used a similar requirement as the basis for the discussion of test-driven development.
http://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html
0
Julian HansenCommented:
Try this
<?php
$urls = array('http://my.example.com','example.com','ftps://my.example.com','https://sub.domain.example.com/mypage?hello');
foreach ($urls as $url) {
  $urlinfo = parse_url($url);
  echo "<pre>" . print_r($urlinfo, true) . "</pre>";
  echo "Domain : " . $urlinfo['host'];
}

Open in new window

0
skijAuthor Commented:
Julian, your idea does not work because some of the domains do not have a prefix.
Notice: Undefined index

REGIX might be needed.
1
Cloud Class® Course: MCSA MCSE Windows Server 2012

This course teaches how to install and configure Windows Server 2012 R2.  It is the first step on your path to becoming a Microsoft Certified Solutions Expert (MCSE).

Ray PaseurCommented:
Here's an example.  You're right on the edge of what you can do with REGEX here, so take it with a grain of salt.  In some contexts regular expressions are not all-powerful.
http://iconoun.com/demo/temp_skij.php
<?php // demo/temp_skij.php

/**
 * http://www.experts-exchange.com/questions/28699566/PHP-REGEX-Get-Domain-Subdomain-from-URLs.html
 * http://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html
 */
error_reporting(E_ALL);
echo '<pre>';


// A LIST OF STRINGS TO SEARCH
$urls = array
( 'http://my.example.com'
, 'example.com'
, 'ftps://my.example.com'
, 'https://sub.domain.example.com/mypage?hello'
)
;


// CREATE A REGULAR EXPRESSION TO SEARCH EACH STRING
$rgx
= '#'              // REGEX DELIMITER

. '('              // START PROTOCOL CAPTURE GROUP
. 'https?|ftps?'   // STRING LITERALS - OPTIONAL "S"
. ')'              // END CAPTURE GROUP
. '?'              // OPTIONAL PROTOCOL

. '('              // START CAPTURE GROUP
. '://'            // STRING LITERAL
. ')'              // END CAPTURE GROUP
. '?'              // OPTIONAL

. '('              // START SUBDOMAIN/DOMAIN CAPTURE GROUP
. '.'              // ANY CHARACTER
. '+?'             // ONE OR MORE UNGREEDY
. ')'              // END CAPTURE GROUP

. '('              // START TLD CAPTURE GROUP
. '\.'             // ESCAPED DOT STRING LITERAL
. 'com|net|org'    // STRING LITERALS
. ')'              // END CAPTURE GROUP

. '#'              // REGEX DELIMITER
. 'i'              // MODIFIER FLAG - CASE-INSENSITIVE
;

foreach ($urls as $url)
{
    echo PHP_EOL . "<b>$url</b>";
    if (preg_match($rgx, $url, $match))
    {
        echo PHP_EOL . "FOUND REGEX: $rgx" . PHP_EOL;
        var_dump($match);

        // SHOW THE SUBDOMAIN AND DOMAIN WITH THE TLD
        echo PHP_EOL . $match[3] . $match[4] . PHP_EOL;
    }
}

Open in new window

0
Ray PaseurCommented:
Here's what I got from parse_url().  The only failure was the URL without a protocol.  Of course, that's not really a URL!
http://my.example.com
array(2) {
  ["scheme"]=>
  string(4) "http"
  ["host"]=>
  string(14) "my.example.com"
}

example.com
array(1) {
  ["path"]=>
  string(11) "example.com"
}

ftps://my.example.com
array(2) {
  ["scheme"]=>
  string(4) "ftps"
  ["host"]=>
  string(14) "my.example.com"
}

https://sub.domain.example.com/mypage?hello
array(4) {
  ["scheme"]=>
  string(5) "https"
  ["host"]=>
  string(22) "sub.domain.example.com"
  ["path"]=>
  string(7) "/mypage"
  ["query"]=>
  string(5) "hello"
}

Open in new window

0
Julian HansenCommented:
Regex will work but there is more than one way to flay the cat

Firstly, if your array contains URL's and parse_url does not return a hostname then simply assume that the url is in its hostname form and use it as is.
Failing that prepend an http onto it and then call parse_url again like so
$urls = array('http://my.example.com','example.com/somefile.html','ftps://my.example.com','https://sub.domain.example.com/mypage?hello');
foreach ($urls as $url) {
  $urlinfo = parse_url($url);
  if (!isset($urlinfo['host'])) {
    $urlinfo = parse_url('http://' . $url);
    if (!isset($urlinfo['host'])) {
      echo "Bad URL";
    }
  }
  echo "<pre>" . print_r($urlinfo, true) . "</pre>";
  echo "Domain : " . $urlinfo['host'];
}

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
PHP

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.