Link to home
Start Free TrialLog in
Avatar of Jerry L
Jerry LFlag for United States of America

asked on

Regex: Remove Subdomain from URL

I'm using PHP 5.x and I need to remove the http://subdomain. from my URL's. The subdomain can contain letters [aA-zZ] and numbers [0-9].

I've got this much,
<?php
      \$full_URL = get_bloginfo('wpurl') ;
      \$http_URL = str_replace(\"http://www.\",\"\",\$full_URL) ;
      \$sub_URL = str_replace(\"http://\",\"\",\$http_URL) ;
      \$root_URL = str_replace( ?? ) ;
?>

but that leaves this,
      subdomain.root.com

I need to also remove the subdomain and the dot "." so what remains is,
      root.com

Thanks for your help.
ASKER CERTIFIED SOLUTION
Avatar of Pratima
Pratima
Flag of India image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Jerry L

ASKER

Thanks, but can you convert that to the "str_replace" syntax with proper escape characters where needed? This code resides inside a Wordpress "page" using the execphp plugin.
You won't be able to do this using only str_replace() unless you know what the exact subdomain string is. For more complex replacement rules (ie. regex) you have to use preg_replace().

If you do know the subdomain name you are trying to strip out, you could do this:

$subdomain = 'mysubdomain';
$mynewurl = str_replace("http://", '' $myurl);
$mynewurl = str_replace("$subdomain.", '' $myurl);

I do appreciate that if execphp does not support preg functions then this leaves you a little stuck.
That said, I can't see why you can't use preg unless it is not available in your php installation (which is unlikely).
Perhaps use substr ?:

<?php
      $url = "http://devd.domainnamffe.co.uk";
        echo substr ($url,strpos($url,".")+1)."<br>";
        $url = "devd.domainnamffe.co.uk";
        echo substr ($url,strpos($url,".")+1);

?>
ps ignore above, its incorrect
What about something like this?
<?php
      $full_URL = get_bloginfo('wpurl') ;
      $pos = strpos($full_URL, "root.com");
      $clean_URL = substr($full_URL, $pos)
?>

Open in new window

Avatar of Jerry L

ASKER

The problem with the above solution is I don't know what the root string is. It is not "root.com", and it can be anything, depending on what domain the script is running on. This needs to be generic.

So, basically, here's what I have so far:
 - I have stripped the http:// from the URL which provides $sub_URL
 - Now I need to strip all characters from the left to the next "dot".

This:  
      subdomain.root.com

Needs to become this:
      root.com

I just don't know the syntax to do this. Also, since it's being passed from within a Wordpress page, all the special characters need to be escaped.

      \$root_URL = str_replace ( \"\^\[a-z,A-Z,0-9\]\",\"\",\$sub_URL ) ;      // All alpha characters
      \$root_URL = str_replace( \"\.\",\"\",\$root_URL ) ;                            // the "dot"


Avatar of Jerry L

ASKER

Perhaps I can rephrase the question...

I need to isolate the "root" domain. Perhaps there is another way to do this of which I am not aware, such as with a Wordpress function, or internal PHP method.

From this:
      http://subdomain.root.com

I need to access this:
      root.com

....

But you're trying to use regular expressions with a function that doesn't use such. Or am I not understanding how str_rpelace works?


>>  Now I need to strip all characters from the left to the next "dot"

How would you intend to handle multiple sub-domains? For instance:  www.hub1.example.com ?
Avatar of Jerry L

ASKER

Please re-read all my posts from the beginning. It answers your question.
Avatar of Jerry L

ASKER

Also, I only need to handle the following types of URL's:

http://www.Root.com
and
http://subdomain.Root.com
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Here is a regex that does it the other way round (ie. it loses the lowest order subdomain from the url):

preg_replace('~[a-z]\.((?:[a-z]+\.)+[a-z]+)$~i', '$1', $url);
>>  Please re-read all my posts from the beginning. It answers your question.

Perhaps you should re-read mine. pratima_mcs gave you an example of how to use preg_replace and you asked to convert it to str_replace. Again I ask, is there some magic about str_replace you are privy to that the rest of the world is not?
Avatar of Jerry L

ASKER

ollyatstithians, your two suggestions do not seem to be working, or perhaps I'm not using it correctly. It's not stripping the subdomain. Can you suggest a fix for it?

<?php
      \$full_URL = get_bloginfo( 'wpurl' ) ;
      \$strip_WWW = str_replace( 'http://www.','',\$full_URL ) ;
      \$strip_HTTP = str_replace( 'http://','',\$strip_WWW ) ;

      \$rootDomain1 = preg_replace( '~.([a-z]+.[a-z]+)~i', '\$1', \$strip_HTTP ) ;
      \$rootDomain2 = preg_replace( '~[a-z]\.((?:[a-z]+\.)+[a-z]+)\$~i', '\$1', \$strip_HTTP ) ;
?>

<?php echo \$full_URL ; ?>
<?php echo \$strip_HTTP ; ?>
<?php echo \$rootDomain1 ; ?>
<?php echo \$rootDomain2 ; ?>

[ I will comment on the other issues and suggestions by pratima_mcs and kaufmed in my next post. ]
Looking at the Exec-PHP FAQ, I can't see any mention of having to escape $ characters. Have you tried it without the escapes?
I'll install the plugin and try it out myself.
Also, please post the urls you tried that didn't work.
Here is the code in the WP post and its outputted text.
I did get the regex a bit wrong, but there is no need to escape variable names.

The code:  
Normal text
<?php
  $url = 'http://www.donkey.bites.com';
  $rootdomain = preg_replace('~.*\.([a-z]+\.[a-z]+)~i', '$1', $url);
  $rootdomain2 = preg_replace('~.*\.((?:[a-z]+\.)+[a-z]+)$~i', '$1', $url);
?>
<p><?php echo $url; ?></p>
<p><?php echo $rootdomain; ?></p>
<p><?php echo $rootdomain2; ?></p>

Open in new window


The output:  
Normal text

http://www.donkey.bites.com

bites.com

bites.com

Open in new window

SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
kaufmed:
I agree. Well spotted.
Avatar of Jerry L

ASKER

Thanks to everyone for their input.

Here's the solution I was able to come up with:

SOLUTION

ConfigFile.php
(this file contains variables used by the main script to create the Wordpress page.)
$pageContent = "(variable definition begins with double quotes) Then lot's of text including html tags.
 Then the php code as follows that must be rendered as text on the web page (using php-exec plugin)...
<?php
	\$full_URL = get_bloginfo( 'wpurl' ) ;
	\$strip_www = str_replace( 'http://www.','',\$full_URL ) ;
	\$strip_http = str_replace( 'http://','',\$strip_www ) ;
	\$rootDomain = preg_replace( '/^(.*?)\.(.*)\$/','\$2', \$full_URL ) ;

	echo '<br />' . \$strip_www ;
	echo '<br />' . \$strip_http ;
	echo '<br />' . \$rootDomain ; 
?>

More page text. Variable definition terminated with closing double quotes."

Open in new window


NOTES

1. Although neither piece of code worked that was provided by ollyatstithians, it was those contributions that helped me sort out the regular expression syntax and determine which characters needed to be escaped. I was then able to apply that knowledge to the original suggestion by pratima_mcs.

Thanks also to kaufmed for valuable contributions.

2. Single quotes do not need to be escaped (as long as the surrounding block was using double quotes, $myVariable = "text & html content plus php code"