• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 736
  • Last Modified:

get domain name from URL in php

hi,
I'm trying to write a code that extracts the domain name from a list of URL's. URL's are written in a txt file.  following code works for most of the URL's but not with .co.uk etc

<?php
if(isset($_POST['submit']))
{
	$lines = file($_FILES['domainUploadFile']['tmp_name']);
	
	foreach ($lines as $line_num => $url) {
		preg_match('@^(?:http://)?([^/]+)@i',
		$url, $matches);
		$host = $matches[1];
	
		// get last two segments of host name
		preg_match('/[^.]+\.[^.]+$/', $host, $matches);
		echo "domain name is: {$matches[0]}\n"."<br />\n";
	   //echo "Line #<b>{$line_num}</b> : " . htmlspecialchars($line) . "<br />\n";
	}
}
?>

Open in new window


this is the input

adnan.com/?att=1&att=2&att3
abc.net
www.giwww.com
http://sites.google.com/
http://www.banksy.co.uk/
http://en.wikipedia.org/wiki/Site

see the output

domain name is: adnan.com
domain name is: abc.net
domain name is: giwww.com
domain name is: google.com
domain name is: co.uk
domain name is: wikipedia.org
0
hiddenpearls
Asked:
hiddenpearls
2 Solutions
 
hernst42Commented:
Or simply use in your loop:
echo str_replace('www.', '', parse_ulr($url, PHP_URL_HOST));
0
 
hiddenpearlsAuthor Commented:
PHP_URL_HOST doesn't work when url is adnan.com/?att=1&att=2&att3  means without http://
0
 
käµfm³d 👽Commented:
This is an extremely difficult thing to do with regex, as you have already witnessed. URLs can have multiple parts, and TLDs can be anywhere from 2 to 6 (maybe more) characters long and themselves consist of multiple parts.

Is there any way to categorize the URLs into general categories of how the URLs are constructred? For example, given your above example, we could say you have 2 categories:

[server].[domain].[tld]
[server].[domain].[co].[country_code]
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
Marco GasiFreelancerCommented:
Try with regex:

$subject = "adnan.com/?att=1&att=2&att3
abc.net
www.giwww.com
http://sites.google.com/
http://www.banksy.co.uk/
http://en.wikipedia.org/wiki/Site";
preg_match('/\b([a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,}\b/ix', $subject, $matches);

echo "<pre>";
var_Dump($matches);

To get individual items do

foreach ($matches[0] as $domain){
  echo $domain . "<br />";
}

Cheers
0
 
Scott MadeiraCommented:
this is ugly code but can be cleaned up to do what you want it to do, I think.  It works with your data set.
<?php

$a[]= 'adnan.com/?att=1&att=2&att3';
$a[]= 'abc.net';
$a[]= 'www.giwww.com';
$a[]= 'http://sites.google.com/';
$a[]= 'http://www.banksy.co.uk/';
$a[]= 'http://en.wikipedia.org/wiki/Site';

foreach ($a as $url){
	$x = parse_url($url);

	// print_r($x);
	
	if (array_key_exists('host', $x)) {
		$parts = explode('.', $x['host']);		
		$new_url1 = array_pop($parts);
		$new_url = array_pop($parts).'.'.$new_url1;
		
	} else if (array_key_exists('path',$x)){ 
		$parts = explode('.', $x['path']);
                // get last two parts of the domain name	
		$part1 = array_pop($parts);
		$part1 = array_pop($parts).'.'.$part1;
                // account for potentail of trailing slash
		$part2 = explode('/',$part1);
		$new_url = $part2[0];
	} else {
		echo 'invalid';
	}
	print_r($new_url);
	echo PHP_EOL;	
}	
?>

Open in new window

0
 
Marco GasiFreelancerCommented:
Can you give some feedback, please?
0
 
Ray PaseurCommented:
Looking at this http://en.wikipedia.org/wiki/Site that you want to turn into this wikipedia.org makes me wonder why you want to discard the "en" part of the name?  The subdomain is fairly important, as is banksy in http://www.banksy.co.uk/.  

What is the desired output from the examples?
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now