hiddenpearls
asked on
get domain name from URL in php
hi,
I'm trying to write a code that extracts the domain name from a list of URL's. URL's are written in a txt file. following code works for most of the URL's but not with .co.uk etc
this is the input
adnan.com/?att=1&att=2&att 3
abc.net
www.giwww.com
http://sites.google.com/
http://www.banksy.co.uk/
http://en.wikipedia.org/wiki/Site
see the output
domain name is: adnan.com
domain name is: abc.net
domain name is: giwww.com
domain name is: google.com
domain name is: co.uk
domain name is: wikipedia.org
I'm trying to write a code that extracts the domain name from a list of URL's. URL's are written in a txt file. following code works for most of the URL's but not with .co.uk etc
<?php
if(isset($_POST['submit']))
{
$lines = file($_FILES['domainUploadFile']['tmp_name']);
foreach ($lines as $line_num => $url) {
preg_match('@^(?:http://)?([^/]+)@i',
$url, $matches);
$host = $matches[1];
// get last two segments of host name
preg_match('/[^.]+\.[^.]+$/', $host, $matches);
echo "domain name is: {$matches[0]}\n"."<br />\n";
//echo "Line #<b>{$line_num}</b> : " . htmlspecialchars($line) . "<br />\n";
}
}
?>
this is the input
adnan.com/?att=1&att=2&att
abc.net
www.giwww.com
http://sites.google.com/
http://www.banksy.co.uk/
http://en.wikipedia.org/wiki/Site
see the output
domain name is: adnan.com
domain name is: abc.net
domain name is: giwww.com
domain name is: google.com
domain name is: co.uk
domain name is: wikipedia.org
ASKER
PHP_URL_HOST doesn't work when url is adnan.com/?att=1&att=2&att 3 means without http://
This is an extremely difficult thing to do with regex, as you have already witnessed. URLs can have multiple parts, and TLDs can be anywhere from 2 to 6 (maybe more) characters long and themselves consist of multiple parts.
Is there any way to categorize the URLs into general categories of how the URLs are constructred? For example, given your above example, we could say you have 2 categories:
Is there any way to categorize the URLs into general categories of how the URLs are constructred? For example, given your above example, we could say you have 2 categories:
[server].[domain].[tld]
[server].[domain].[co].[co untry_code ]
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Can you give some feedback, please?
Looking at this http://en.wikipedia.org/wiki/Site that you want to turn into this wikipedia.org makes me wonder why you want to discard the "en" part of the name? The subdomain is fairly important, as is banksy in http://www.banksy.co.uk/.
What is the desired output from the examples?
What is the desired output from the examples?
echo str_replace('www.', '', parse_ulr($url, PHP_URL_HOST));