We help IT Professionals succeed at work.

We've partnered with Certified Experts, Carl Webster and Richard Faulkner, to bring you two Citrix podcasts. Learn about 2020 trends and get answers to your biggest Citrix questions!Listen Now


Rewriting href src and action

askanthonys asked
Medium Priority
Last Modified: 2008-03-10
OK here is what I am doing:

I am writing a page that fetches another page for caching purposes...

what I want to do is this:

replace any <img src="/relative.link"> with <img src="http://mysite.com/image.php?site=http://theirdomain.com/relative.link">

I also want to do this with href and action

There will possibly be html attributes before the tag start and the attribute.

There may also be a difference in quotes such as " ' or none at all

I also want them to go to their corresponding pages like /htmlpage.php and /formpage.php

if possible I would like regular expressions

the domain will be stored in $domain without a trailing '/'

Thank you if you can help!
Watch Question


also if they may already have their domain in their link in which I would only want to prepend http://mysite.com/image.php?site=
I am sure there was a quick script on php.net but I cant find it.  A quick google brings up something called PageForward


which led me to


Seems to do what you want but doesnt cache the pages.  Given it is written in php it should be easily customisable using ob_start, ob_get_contents ob_end and ob_flush...


I looked through these files and it just seems to be an endless ammount of includes and stuff

I couldn't easily find what I was looking for...

If you could tell me where exactly this code happens to be I would be grateful

and just ignore the cache thing... I decided that would put too much stress on my server
I dont quite understand

Page forward is here - http://prdownloads.sourceforge.net/pageforward/pf1.5b2.zip?download 

Its one file, you need php4-curl installed (its a PHP module) for it to work.  If you open the file, look for "$proxify_media = false" and change the false to a true, it should then also do the re-writing for images and script files.

If you want it to cache files, find


and add

//Cache files
$cache_files = true;
$cache_dir = "./cache";

then find


and add

if ($cache_files){
      if ( is_file($cache_dir."/".urlencode($url)) ){
            if ( filemtime($cache_dir."/".urlencode($url)) > (time() - (5 * 60)) ){

(change the 5 to however many minutes you want it to re-get files after, set it to 10 it wont bother for 10 minutes) and finally find

      echo ("\n<!-- PageForward v1.5b2 took $duration seconds to construct this page.-->");

and add

      if ($cache_files){
            $fp = fopen($cache_dir."/".urlencode($url),"w+b");
      echo ("\n<!-- PageForward v1.5b2 took $duration seconds to construct this page.-->");

It should then cache all the files into that directory.  Note that this could soon become quite a collection and will probably need to be cleaned out on a regular basis.


I am not worried about caching files...

I would just like to rewrite the <a href> <img src> and the <form action> tags
Ok, if you look at


this function -

function completeURLs($HTML, $url){
      $URI_PARTS = parseURL($url);
      $path = trim($URI_PARTS["path"], "/");
      $host_url = trim($URI_PARTS["host"], "/");
      //$host = $URI_PARTS["scheme"]."://".trim($URI_PARTS["host"], "/")."/".$path; //ORIGINAL
      $host = $URI_PARTS["scheme"]."://".$host_url."/".$path."/";
      $host_no_path = $URI_PARTS["scheme"]."://".$host_url."/";
      //Proxifies local META redirects
      $HTML = preg_replace('@<META HTTP-EQUIV(.*)URL=/@', "<META HTTP-EQUIV\$1URL=".$_SERVER['PHP_SELF']."?url=".$host_no_path, $HTML);
      //Make sure the host doesn't end in '//'
      $host = rtrim($host, '/')."/";
      //Replace '//' with 'http://'
      $pattern = "#(?<=\"|'|=)\/\/#"; //the '|=' is experimental as it's probably not necessary
      $HTML = preg_replace($pattern, "http://", $HTML);
      //Fully qualifies '"/'
      $HTML = preg_replace("#\"\/#", "\"".$host, $HTML);
      //Fully qualifies "'/"
      $HTML = preg_replace("#\'\/#", "\'".$host, $HTML);
      //Matches [src|href|background|action]="/ because in the following pattern the '/' shouldn't stay
      $HTML = preg_replace("#(src|href|background|action)(=\"|='|=(?!'|\"))\/#i", "\$1\$2".$host_no_path, $HTML);
      $HTML = preg_replace("#(href|src|background|action)(=\"|=(?!'|\")|=')(?!http|ftp|https|\"|'|javascript:|mailto:)#i", "\$1\$2".$host, $HTML);
      //Points all form actions back to the proxy
      $HTML = preg_replace('/<form.+?action=\s*(["\']?)([^>\s"\']+)\\1[^>]*>/i', "<form action=\"{$_SERVER['PHP_SELF']}\"><input type=\"hidden\" name=\"original_url\" value=\"$2\">", $HTML);
      //Matches '/[any assortment of chars or nums]/../'
      $HTML = preg_replace("#\/(\w*?)\/\.\.\/(.*?)>#ims", "/\$2>", $HTML);
      //Matches '/./'
      $HTML = preg_replace("#\/\.\/(.*?)>#ims", "/\$1>", $HTML);

      //Handles CSS2 imports
      if (strpos($HTML, "import url(\"http") == false && (strpos($HTML, "import \"http") == false) && strpos($HTML, "import url(\"www") == false && (strpos($HTML, "import \"www") == false)) {
            $pattern = "#import .(.*?).;#ims";
            $mainurl = substr($host, 0, strnpos($host, "/", 3));
            $replace = "import '".$mainurl."\$1';";
            $HTML = preg_replace($pattern, $replace, $HTML);
      return $HTML;

takes the file contents in as $HTML and then changes all the links into fully qualified links such that their domain is always in the link.  Then this function -

function proxyURLs($HTML){
      $edited_tag = "PF"; //used to check if the link has already been modified by the proxy
      //BASE tag needs to be removed for sites like yahoo.com
      //OR make the proxy insert the FULL URL to itself
      $pattern = "#\<base(.*?)\>#ims";
      $replacement = "<!-- <base\$1> -->"; //comment it out for now//
      $HTML = preg_replace($pattern, $replacement, $HTML);
      //edit <link tags so that 'edited="$edit_tag" ' is just before 'href'
      $HTML = preg_replace("#\<link(.*?)(\shref=)#ims", "<link\$1 edited=\"".$edited_tag."\"\$2", $HTML);
      //matches everything with an </a> after it on the same line....fails to match when that is on another line.
      $pattern = "#(?<!edited=\"".$edited_tag."\"\s)(href='|href=\"|href=(?!'|\"))(?=(.+)\</a\>)(?!mailto:|http://ftp|ftp|javascript:|'|\")#ims";
      $HTML = preg_replace($pattern, "edited=\"".$edited_tag."\" \$1".$_SERVER['PHP_SELF'].'?url=', $HTML);
      return $HTML;

takes every link in the page (again, as $HTML) and prepends a link to the current proxy.  Finally, this section down the bottom -

      if ($proxify_media) {
            $pattern = '/src=\s*(["\']?)([^>\s"\']+)\\1[^>]*>/i';
            $replace = "src=\"{$_SERVER['PHP_SELF']}?url=$2\">";
            $HTML = preg_replace($pattern, $replace, $HTML);

does the same thing for images and javascript functions (as they contain src= parts).  All you need to do is get the URL to be parsed into the page then call the top two functions and then apply the last chunk of code to the contents.

Not the solution you were looking for? Getting a personalized solution is easy.

Ask the Experts


thank you!
Access more of Experts Exchange with a free account
Thanks for using Experts Exchange.

Create a free account to continue.

Limited access with a free account allows you to:

  • View three pieces of content (articles, solutions, posts, and videos)
  • Ask the experts questions (counted toward content limit)
  • Customize your dashboard and profile

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.


Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.