[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1648
  • Last Modified:

Extract domain from URL in PHP

I am trying to write a PHP script to return the domain from the host name.

For example, all of these would return 'google':
google.com
www.google.com
google.co.uk
www.google.co.uk
www.google.org
google.org

Thanks,

Chris
0
sypder
Asked:
sypder
  • 5
  • 4
  • 4
  • +2
3 Solutions
 
MasonWolfCommented:
I can't think of any regular expression matching that would do this perfectly, but if you can be certain that the domain to be matched will be at least 4 characters long, here's one that could work.

$domain_elements = explode('.',$_SERVER['HOST_NAME']);
array_pop($domain_elements); //the last element is never the value you want
do {
$domain = array_pop($domain_elements);
}
while(strlen($domain) < 4 && !empty($domain_elements));

Now, on a site like "aa.com" this won't work. Same with "msn.com". But the majority of domain names are at least 4 characters, so if a 98% solution is acceptable, then here you go.
0
 
sypderAuthor Commented:
Yeah, I had thought about that one. But we do a lot of advertising with MSN, so we would need to catch that one. Of course, I could make an exemption.

Right now I basically have one of these 98% solutions, and I was looking to make it perfect, which should be doable. I figured there would be a really smart way to do it.

Chris
0
 
AlexanderRCommented:
do you take subdomains into account? Does maps.google.com need to return google as well?
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
etullyCommented:
if (ereg(".*\.([^\.]+)\.[^\.]+$",$z,$regs)) $answer = $regs[1];
if (ereg("^([^\.]+)\.[^\.]+$",$z,$regs)) $answer $regs[1];
echo $answer;
0
 
etullyCommented:
oops.  precede those lines with:

$z = "insert.your.domain.here";
0
 
etullyCommented:
This one is better:

<?
$z = "asdf.maps.google.co.uk";

if (ereg("^([^\.]+\.)*([^\.]+)\.(com|net|org|gov|mil|co\.uk)$",$z,$regs)) echo $regs[2];

?>
0
 
MasonWolfCommented:
Ok, try this instead:

$domain_elements = explode('.',$_SERVER['HOST_NAME']);
array_pop($domain_elements); //the last element is never the value you want
$domain == array_pop($domain_elements);
if($domain == "co")
     $domain = array_pop($domain_elements);

Since the only case where there's going to be 2 top level domains is when there's a "co" in front, this solution will work on every situation except where the return value really is supposed to be co. (i.e. "www.co.com")
0
 
sypderAuthor Commented:
MasonWolf,

I think you are on to something good. How about:

http://www.dwp.gov.uk/ the domain in this case is dwp
http://www.nationaltrust.org.uk the domain in this case is nationaltrust

I presume the only real solution is to make a list of all the top level domains and remove those?
0
 
MasonWolfCommented:
Well, different countries treat their tld's differently.

www.com.tv is a perfectly legit domain (you can type it in and see for yourself - some Japanese cutesy page advertising a phone)

The point is, the only way to be 100% certain of the domain is to do the same thing the browsers do and trace the dns route beginning with the top domain and working down to an individual website.

Unfortunately, I don't know how to do that.
0
 
sypderAuthor Commented:
Thanks MasonWolf,

I will do some googling and see how feasible what you are proposing is. In the meantime, I am using the function you have.
0
 
etullyCommented:
<trace the dns route beginning with the top domain...Unfortunately, I don't know how to do that.>

It would take an hour or two to code that... but it's pretty much unnecessary.  The code should simply have an array at the beginning listing the TLD's.  This way, sypder can add new ones over time when the situation arises.

I mean,  if you need for it to be perfect on day one,  then it's a pretty big project.  If you can take something that is 99% perfect and fine tune it over time by adding a few new TLD's when you learn them,  then it's a lot easier project.  Like this:

<?php

$tlds = array("com","net","org","co.uk");

$z = "asdf.maps.google.co.uk";

foreach ($tlds as $t) {
        $pattern = "^(.*\.)*([^\.]+)\." . $t . "$";
        if (ereg($pattern,$z,$regs)) { $domain = $regs[2]; }
}
 
?>


Just add as many TLD's to the array as you want.
0
 
etullyCommented:
echo $domain;
0
 
MasonWolfCommented:
Thanks for the grade and the points. I hope you can get a 100% solution working soon. If I happen to run across a method to trace the dns route, I'll post it here.

By the way, this question pushed me over into my first EE rank. :) So thanks again and good luck!
0
 
karamillaCommented:
I hope that this function is useful for anyone

function getDomainFromUrl($url){

      $url = str_replace('http://', '', $url);
      $url = str_replace('www.', '', $url);
      
      $domain_elements = explode('/',$url);
      return $domain_elements[0];
}
0
 
sypderAuthor Commented:
karamilla, I think this might night work, for example:

http://login.experts-exchange.com

The "domain" is experts-exchange.com., but the suggested script would return login.experts-exchange.com
0
 
karamillaCommented:
sypder, thanks for your replay
I'm sorry that this function doesn't help you

in my site i have services that other sites can use, so i want to make statistics about this sites.

1. it's important to know the sub domain name, for example :
http://www.amalatawy.blogspot.com
so , it will not be useful  to know the domain name only, cos  blogspot.com could have a lot of sub domains.

2. the url argument will be a complete url , cos my script will open at another sites, for example :
http://www.amalatawy.blogspot.com/news/details.php?id=?????
look,  we can found another dots after the domain name "  details.php " , and we don't know how many it will be

thanks for your note, it was useful for me
0

Featured Post

Prep for the ITIL® Foundation Certification Exam

December’s Course of the Month is now available! Enroll to learn ITIL® Foundation best practices for delivering IT services effectively and efficiently.

  • 5
  • 4
  • 4
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now