?
Solved

Rewriting href src and action

Posted on 2006-03-30
7
Medium Priority
?
535 Views
Last Modified: 2008-03-10
OK here is what I am doing:

I am writing a page that fetches another page for caching purposes...

what I want to do is this:

replace any <img src="/relative.link"> with <img src="http://mysite.com/image.php?site=http://theirdomain.com/relative.link">

I also want to do this with href and action

There will possibly be html attributes before the tag start and the attribute.

There may also be a difference in quotes such as " ' or none at all

I also want them to go to their corresponding pages like /htmlpage.php and /formpage.php

if possible I would like regular expressions

the domain will be stored in $domain without a trailing '/'

Thank you if you can help!
0
Comment
Question by:askanthonys
  • 4
  • 3
7 Comments
 
LVL 4

Author Comment

by:askanthonys
ID: 16337823
also if they may already have their domain in their link in which I would only want to prepend http://mysite.com/image.php?site=
0
 
LVL 2

Expert Comment

by:CaveyCoUk
ID: 16338052
I am sure there was a quick script on php.net but I cant find it.  A quick google brings up something called PageForward

http://joshdick.net/index.php?section=creations

which led me to

http://sbp.sf.net/

Seems to do what you want but doesnt cache the pages.  Given it is written in php it should be easily customisable using ob_start, ob_get_contents ob_end and ob_flush...
0
 
LVL 4

Author Comment

by:askanthonys
ID: 16338142
I looked through these files and it just seems to be an endless ammount of includes and stuff

I couldn't easily find what I was looking for...

If you could tell me where exactly this code happens to be I would be grateful

and just ignore the cache thing... I decided that would put too much stress on my server
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 2

Expert Comment

by:CaveyCoUk
ID: 16342998
I dont quite understand

Page forward is here - http://prdownloads.sourceforge.net/pageforward/pf1.5b2.zip?download 

Its one file, you need php4-curl installed (its a PHP module) for it to work.  If you open the file, look for "$proxify_media = false" and change the false to a true, it should then also do the re-writing for images and script files.

If you want it to cache files, find

//**END USER CONFIG**

and add

//Cache files
$cache_files = true;
$cache_dir = "./cache";
//**END USER CONFIG**

then find

if(!$form_submission){

and add

if ($cache_files){
      if ( is_file($cache_dir."/".urlencode($url)) ){
            if ( filemtime($cache_dir."/".urlencode($url)) > (time() - (5 * 60)) ){
                  readfile($cache_dir."/".urlencode($url));
                  exit();
            }
      }
}
if(!$form_submission){

(change the 5 to however many minutes you want it to re-get files after, set it to 10 it wont bother for 10 minutes) and finally find

      echo ("\n<!-- PageForward v1.5b2 took $duration seconds to construct this page.-->");

and add

      if ($cache_files){
            $fp = fopen($cache_dir."/".urlencode($url),"w+b");
            fwrite($fp,$HTML);
            fclose($fp);
      }
      echo ("\n<!-- PageForward v1.5b2 took $duration seconds to construct this page.-->");

It should then cache all the files into that directory.  Note that this could soon become quite a collection and will probably need to be cleaned out on a regular basis.
0
 
LVL 4

Author Comment

by:askanthonys
ID: 16346160
I am not worried about caching files...

I would just like to rewrite the <a href> <img src> and the <form action> tags
0
 
LVL 2

Accepted Solution

by:
CaveyCoUk earned 2000 total points
ID: 16353828
Ok, if you look at

http://prdownloads.sourceforge.net/pageforward/pf1.5b2.zip?download 

this function -

function completeURLs($HTML, $url){
      $URI_PARTS = parseURL($url);
      $path = trim($URI_PARTS["path"], "/");
      $host_url = trim($URI_PARTS["host"], "/");
      
      //$host = $URI_PARTS["scheme"]."://".trim($URI_PARTS["host"], "/")."/".$path; //ORIGINAL
      $host = $URI_PARTS["scheme"]."://".$host_url."/".$path."/";
      $host_no_path = $URI_PARTS["scheme"]."://".$host_url."/";
      
      //Proxifies local META redirects
      $HTML = preg_replace('@<META HTTP-EQUIV(.*)URL=/@', "<META HTTP-EQUIV\$1URL=".$_SERVER['PHP_SELF']."?url=".$host_no_path, $HTML);
      
      //Make sure the host doesn't end in '//'
      $host = rtrim($host, '/')."/";
      
      //Replace '//' with 'http://'
      $pattern = "#(?<=\"|'|=)\/\/#"; //the '|=' is experimental as it's probably not necessary
      $HTML = preg_replace($pattern, "http://", $HTML);
      
      //Fully qualifies '"/'
      $HTML = preg_replace("#\"\/#", "\"".$host, $HTML);
      
      //Fully qualifies "'/"
      $HTML = preg_replace("#\'\/#", "\'".$host, $HTML);
      
      //Matches [src|href|background|action]="/ because in the following pattern the '/' shouldn't stay
      $HTML = preg_replace("#(src|href|background|action)(=\"|='|=(?!'|\"))\/#i", "\$1\$2".$host_no_path, $HTML);
      $HTML = preg_replace("#(href|src|background|action)(=\"|=(?!'|\")|=')(?!http|ftp|https|\"|'|javascript:|mailto:)#i", "\$1\$2".$host, $HTML);
      
      //Points all form actions back to the proxy
      $HTML = preg_replace('/<form.+?action=\s*(["\']?)([^>\s"\']+)\\1[^>]*>/i', "<form action=\"{$_SERVER['PHP_SELF']}\"><input type=\"hidden\" name=\"original_url\" value=\"$2\">", $HTML);
      
      //Matches '/[any assortment of chars or nums]/../'
      $HTML = preg_replace("#\/(\w*?)\/\.\.\/(.*?)>#ims", "/\$2>", $HTML);
      
      //Matches '/./'
      $HTML = preg_replace("#\/\.\/(.*?)>#ims", "/\$1>", $HTML);

      //Handles CSS2 imports
      if (strpos($HTML, "import url(\"http") == false && (strpos($HTML, "import \"http") == false) && strpos($HTML, "import url(\"www") == false && (strpos($HTML, "import \"www") == false)) {
            $pattern = "#import .(.*?).;#ims";
            $mainurl = substr($host, 0, strnpos($host, "/", 3));
            $replace = "import '".$mainurl."\$1';";
            $HTML = preg_replace($pattern, $replace, $HTML);
      }
            
      return $HTML;
}

takes the file contents in as $HTML and then changes all the links into fully qualified links such that their domain is always in the link.  Then this function -

function proxyURLs($HTML){
      $edited_tag = "PF"; //used to check if the link has already been modified by the proxy
      
      //BASE tag needs to be removed for sites like yahoo.com
      //OR make the proxy insert the FULL URL to itself
      $pattern = "#\<base(.*?)\>#ims";
      $replacement = "<!-- <base\$1> -->"; //comment it out for now//
      $HTML = preg_replace($pattern, $replacement, $HTML);
      
      //edit <link tags so that 'edited="$edit_tag" ' is just before 'href'
      $HTML = preg_replace("#\<link(.*?)(\shref=)#ims", "<link\$1 edited=\"".$edited_tag."\"\$2", $HTML);
      
      //matches everything with an </a> after it on the same line....fails to match when that is on another line.
      $pattern = "#(?<!edited=\"".$edited_tag."\"\s)(href='|href=\"|href=(?!'|\"))(?=(.+)\</a\>)(?!mailto:|http://ftp|ftp|javascript:|'|\")#ims";
      $HTML = preg_replace($pattern, "edited=\"".$edited_tag."\" \$1".$_SERVER['PHP_SELF'].'?url=', $HTML);
      
      return $HTML;
}

takes every link in the page (again, as $HTML) and prepends a link to the current proxy.  Finally, this section down the bottom -

      if ($proxify_media) {
            $pattern = '/src=\s*(["\']?)([^>\s"\']+)\\1[^>]*>/i';
            $replace = "src=\"{$_SERVER['PHP_SELF']}?url=$2\">";
            $HTML = preg_replace($pattern, $replace, $HTML);
      }

does the same thing for images and javascript functions (as they contain src= parts).  All you need to do is get the URL to be parsed into the page then call the top two functions and then apply the last chunk of code to the contents.
0
 
LVL 4

Author Comment

by:askanthonys
ID: 16356135
thank you!
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

3 proven steps to speed up Magento powered sites. The article focus is on optimizing time to first byte (TTFB), full page caching and configuring server for optimal performance.
There are times when I have encountered the need to decompress a response from a PHP request. This is how it's done, but you must have control of the request and you can set the Accept-Encoding header.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
Suggested Courses
Course of the Month15 days, 14 hours left to enroll

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question