Solved

Regex: Remove Subdomain from URL

Posted on 2011-03-20
22
2,270 Views
Last Modified: 2012-05-11
I'm using PHP 5.x and I need to remove the http://subdomain. from my URL's. The subdomain can contain letters [aA-zZ] and numbers [0-9].

I've got this much,
<?php
      \$full_URL = get_bloginfo('wpurl') ;
      \$http_URL = str_replace(\"http://www.\",\"\",\$full_URL) ;
      \$sub_URL = str_replace(\"http://\",\"\",\$http_URL) ;
      \$root_URL = str_replace( ?? ) ;
?>

but that leaves this,
      subdomain.root.com

I need to also remove the subdomain and the dot "." so what remains is,
      root.com

Thanks for your help.
0
Comment
Question by:WizeOwl
  • 8
  • 7
  • 4
  • +2
22 Comments
 
LVL 39

Accepted Solution

by:
Pratima Pharande earned 300 total points
ID: 35178518
function strip_out_subdomain($domain)  {      $only_my_domain = preg_replace("/^(.*?)\.(.*)$/","$2",$domain);      return $only_my_domain;  }
0
 

Author Comment

by:WizeOwl
ID: 35178541
Thanks, but can you convert that to the "str_replace" syntax with proper escape characters where needed? This code resides inside a Wordpress "page" using the execphp plugin.
0
 
LVL 10

Expert Comment

by:ollyatstithians
ID: 35179650
You won't be able to do this using only str_replace() unless you know what the exact subdomain string is. For more complex replacement rules (ie. regex) you have to use preg_replace().

If you do know the subdomain name you are trying to strip out, you could do this:

$subdomain = 'mysubdomain';
$mynewurl = str_replace("http://", '' $myurl);
$mynewurl = str_replace("$subdomain.", '' $myurl);

I do appreciate that if execphp does not support preg functions then this leaves you a little stuck.
0
 
LVL 10

Expert Comment

by:ollyatstithians
ID: 35179662
That said, I can't see why you can't use preg unless it is not available in your php installation (which is unlikely).
0
 
LVL 13

Expert Comment

by:darren-w-
ID: 35179996
Perhaps use substr ?:

<?php
      $url = "http://devd.domainnamffe.co.uk";
        echo substr ($url,strpos($url,".")+1)."<br>";
        $url = "devd.domainnamffe.co.uk";
        echo substr ($url,strpos($url,".")+1);

?>
0
 
LVL 13

Expert Comment

by:darren-w-
ID: 35180393
ps ignore above, its incorrect
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 35180610
What about something like this?
<?php
      $full_URL = get_bloginfo('wpurl') ;
      $pos = strpos($full_URL, "root.com");
      $clean_URL = substr($full_URL, $pos)
?>

Open in new window

0
 

Author Comment

by:WizeOwl
ID: 35184592
The problem with the above solution is I don't know what the root string is. It is not "root.com", and it can be anything, depending on what domain the script is running on. This needs to be generic.

So, basically, here's what I have so far:
 - I have stripped the http:// from the URL which provides $sub_URL
 - Now I need to strip all characters from the left to the next "dot".

This:  
      subdomain.root.com

Needs to become this:
      root.com

I just don't know the syntax to do this. Also, since it's being passed from within a Wordpress page, all the special characters need to be escaped.

      \$root_URL = str_replace ( \"\^\[a-z,A-Z,0-9\]\",\"\",\$sub_URL ) ;      // All alpha characters
      \$root_URL = str_replace( \"\.\",\"\",\$root_URL ) ;                            // the "dot"


0
 

Author Comment

by:WizeOwl
ID: 35184640
Perhaps I can rephrase the question...

I need to isolate the "root" domain. Perhaps there is another way to do this of which I am not aware, such as with a Wordpress function, or internal PHP method.

From this:
      http://subdomain.root.com

I need to access this:
      root.com

0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 35184664
....

But you're trying to use regular expressions with a function that doesn't use such. Or am I not understanding how str_rpelace works?


>>  Now I need to strip all characters from the left to the next "dot"

How would you intend to handle multiple sub-domains? For instance:  www.hub1.example.com ?
0
 

Author Comment

by:WizeOwl
ID: 35186636
Please re-read all my posts from the beginning. It answers your question.
0
What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

 

Author Comment

by:WizeOwl
ID: 35186994
Also, I only need to handle the following types of URL's:

http://www.Root.com
and
http://subdomain.Root.com
0
 
LVL 10

Assisted Solution

by:ollyatstithians
ollyatstithians earned 100 total points
ID: 35187646
OK, how about this:
Assuming that the domain is a .com (or a tld with a non reserved sub-domain, so NOT .co.uk or similar) you can use:

$rootdomain = preg_replace('~\.([a-z]+\.[a-z]+)~i', '$1', $url);

so if you want to eval() that code (which is what I think your plugin is doing) you need to escape the quotes:

$stringtoeval = '$rootdomain = preg_replace(\'~\.([a-z]+\.[a-z]+)~i\', \'$1\', $url);'

You should only need to escape $ characters in double quoted strings.

Just to reiterate what kaufmed said: You cannot use regular expressions with str_replace(). At all.
0
 
LVL 10

Expert Comment

by:ollyatstithians
ID: 35187664
Here is a regex that does it the other way round (ie. it loses the lowest order subdomain from the url):

preg_replace('~[a-z]\.((?:[a-z]+\.)+[a-z]+)$~i', '$1', $url);
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 35190295
>>  Please re-read all my posts from the beginning. It answers your question.

Perhaps you should re-read mine. pratima_mcs gave you an example of how to use preg_replace and you asked to convert it to str_replace. Again I ask, is there some magic about str_replace you are privy to that the rest of the world is not?
0
 

Author Comment

by:WizeOwl
ID: 35194800
ollyatstithians, your two suggestions do not seem to be working, or perhaps I'm not using it correctly. It's not stripping the subdomain. Can you suggest a fix for it?

<?php
      \$full_URL = get_bloginfo( 'wpurl' ) ;
      \$strip_WWW = str_replace( 'http://www.','',\$full_URL ) ;
      \$strip_HTTP = str_replace( 'http://','',\$strip_WWW ) ;

      \$rootDomain1 = preg_replace( '~.([a-z]+.[a-z]+)~i', '\$1', \$strip_HTTP ) ;
      \$rootDomain2 = preg_replace( '~[a-z]\.((?:[a-z]+\.)+[a-z]+)\$~i', '\$1', \$strip_HTTP ) ;
?>

<?php echo \$full_URL ; ?>
<?php echo \$strip_HTTP ; ?>
<?php echo \$rootDomain1 ; ?>
<?php echo \$rootDomain2 ; ?>

[ I will comment on the other issues and suggestions by pratima_mcs and kaufmed in my next post. ]
0
 
LVL 10

Expert Comment

by:ollyatstithians
ID: 35196836
Looking at the Exec-PHP FAQ, I can't see any mention of having to escape $ characters. Have you tried it without the escapes?
I'll install the plugin and try it out myself.
0
 
LVL 10

Expert Comment

by:ollyatstithians
ID: 35196840
Also, please post the urls you tried that didn't work.
0
 
LVL 10

Expert Comment

by:ollyatstithians
ID: 35197036
Here is the code in the WP post and its outputted text.
I did get the regex a bit wrong, but there is no need to escape variable names.

The code:  
Normal text
<?php
  $url = 'http://www.donkey.bites.com';
  $rootdomain = preg_replace('~.*\.([a-z]+\.[a-z]+)~i', '$1', $url);
  $rootdomain2 = preg_replace('~.*\.((?:[a-z]+\.)+[a-z]+)$~i', '$1', $url);
?>
<p><?php echo $url; ?></p>
<p><?php echo $rootdomain; ?></p>
<p><?php echo $rootdomain2; ?></p>

Open in new window


The output:  
Normal text

http://www.donkey.bites.com

bites.com

bites.com

Open in new window

0
 
LVL 74

Assisted Solution

by:käµfm³d 👽
käµfm³d   👽 earned 100 total points
ID: 35197608
@ollyatstithians

>>  Here is the code in the WP post and its outputted text.

What about the site:  www.extreme1.com ?

Domain names' valid character list is:  a-zA-Z0-9-

You would need to modify the pattern to accommodate. All "[a-z]" should become "[a-z0-9-]" (leaving out A-Z since you added the "i" modifier).
0
 
LVL 10

Expert Comment

by:ollyatstithians
ID: 35197975
kaufmed:
I agree. Well spotted.
0
 

Author Comment

by:WizeOwl
ID: 35204719
Thanks to everyone for their input.

Here's the solution I was able to come up with:

SOLUTION

ConfigFile.php
(this file contains variables used by the main script to create the Wordpress page.)
$pageContent = "(variable definition begins with double quotes) Then lot's of text including html tags.
 Then the php code as follows that must be rendered as text on the web page (using php-exec plugin)...
<?php
	\$full_URL = get_bloginfo( 'wpurl' ) ;
	\$strip_www = str_replace( 'http://www.','',\$full_URL ) ;
	\$strip_http = str_replace( 'http://','',\$strip_www ) ;
	\$rootDomain = preg_replace( '/^(.*?)\.(.*)\$/','\$2', \$full_URL ) ;

	echo '<br />' . \$strip_www ;
	echo '<br />' . \$strip_http ;
	echo '<br />' . \$rootDomain ; 
?>

More page text. Variable definition terminated with closing double quotes."

Open in new window


NOTES

1. Although neither piece of code worked that was provided by ollyatstithians, it was those contributions that helped me sort out the regular expression syntax and determine which characters needed to be escaped. I was then able to apply that knowledge to the original suggestion by pratima_mcs.

Thanks also to kaufmed for valuable contributions.

2. Single quotes do not need to be escaped (as long as the surrounding block was using double quotes, $myVariable = "text & html content plus php code"
0

Featured Post

Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
Importing and exporting data Magento 1.x ? 4 38
How can I do this in Pyhton? 12 74
Jquery Autocomplete PHP script 3 23
modify h2 4 19
Foreword (July, 2015) Since I first wrote this article, years ago, a great many more people have begun using the internet.  They are coming online from every part of the globe, learning, reading, shopping and spending money at an ever-increasing ra…
Part of the Global Positioning System A geocode (https://developers.google.com/maps/documentation/geocoding/) is the major subset of a GPS coordinate (http://en.wikipedia.org/wiki/Global_Positioning_System), the other parts being the altitude and t…
The viewer will learn how to dynamically set the form action using jQuery.
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now