Solved

String Manipulation in PHP - working with HTML and paths

Posted on 2004-10-15
11
249 Views
Last Modified: 2008-03-03
I need to do quite a lot of string manipuation in PHP for an application that we are setting up.  I am familiar with PHP, but I've not used the string functions much, and after pouring over some resources this afternoon, I decided that the experts would provide a quicker way for me to get up to speed.

The app is pulling entire web pages from other domains into a string and then delivering them inline as part of a locally hosted page.  Here is what I need to be able to do:

1)For images, stylesheet references, .js files, et all, I need to take all the non-explicit links (there is a proper term for this which escapes me..) and make them explicit. e.g.
               src="/fred/index.php"
becomes: src="http://www.theirserver.com/fred/index.php"

2) manipulate the string so that links like <a href="/fred/index.php> click me </a>
becomes: <a href="http://www.myserver.com/page.php?linkpath=http://www.theirserver.com/fred/index.php> click me </a>
(the links get handled by the local app, and then the corrosponding remote page is loaded inline as part of a local page.)

3) I also need to accomodate in both of these scenarios for page relative links that will need to be converted to the full path, including the domain.

      of course the use of whitespace in the first two examples will vary as HTML allows ( can be src= " or src = " or src =" , et all)

4) Finally, (and I'll gladly put this up as a seperate question with points if I'm asking too much here for my 500...) I need to trace the user's click path as they move from page to page.  There requests will all be handled by the same PHP page on the local site (page.php in point 2 above), with the target page appering after the ?.  I'm assuming that I cannot store that data in an array, as the array would be re-dimensioned each time that page.php is loaded, so my preference would be to write the data to an xml document so that I can manipulate it with other tools for reporting purposes.  I would like to be able to transform the xml document into html for display on the site as well.


0
Comment
Question by:shotokai
  • 6
  • 5
11 Comments
 
LVL 27

Expert Comment

by:caterham_www
ID: 12328067
Hi,

for 1) + 3) try this:

<?
//counter-Variable
$i=0;
//your string
$var='hello world <src ="/fred/index.php" img src= "hello/test.php"> ';

do {
      ${"var".$i}=$var;
      $var = eregi_replace ("src\ ?=\ ?\"(.[^:]*)\"","src=\"http://www.theirserver.com\\1\"",$var);
} while ($var != ${"var".$i})

?>

and for 2) + 3)

<?
//counter-Variable
$i=0;
//your string
$var='<a href ="/fred/index.php">hello world</a> <a href= "/hello/test.php">hello</a>';
do {
      ${"var".$i}=$var;
      $var = eregi_replace ("a href\ ?=\ ?\"(.[^:]*)\"","a href=\"http://www.myserver.com/page.php?linkpath=http://www.theirserver.com\\1\"",$var);
} while ($var != ${"var".$i})
?>


Bob
0
 

Author Comment

by:shotokai
ID: 12339469
Thanks Bob:

this is kind of weird, but the eregi_replace isn't doing anything.  I used variables for the search and replace strings so that I could verify that nothing was happening.  Here is the code behind the page, including the part where the html content of the subject site gets pulled into a string:

<? if (! empty($testsite)) {
 
// first step is to determine the domain of the site being tested
//get rid of the http:// if it is used
$search = array("http://");
$replace = array("");
$domain = str_replace($search,$replace,$testsite);
//get rid of the trailing slash and anything there after
$trimright = strcspn($domain,"/");
$domain = substr($domain,"0",$trimright);
$domain = str_replace("/*","",$domain);
//$domain = substr($testsite, 7);
echo ($domain)."<br>";
$testsite = ("http://" . $domain) ;
echo ($testsite);
$incpage = include($testsite);

// next we run through the string (the html content of the test site) and
//      1) ammend the src references to include the full domain of hte site
$i=0;
do {
  //  ${"incpage".$i}=$incpage;
      $search = "a";
      //$search = "src\ ?=\ ?\"(.[^:]*)\""
      $replace= "apples";
      //$replace = "src=\"http://www.theirserver.com\\1\""
     $incpage = eregi_replace ($search,$replace,$incpage);
} while ($incpage != ${"incpage".$i});


//      2) next we need to change any link references to include the path to the usability application
$i=0;
do {
     ${"incpage".$i}=$incpage;
     $incpage = eregi_replace ("a href\ ?=\ ?\"(.[^:]*)\"","a href=\"http://www.myserver.com/page.php?linkpath=http://www.theirserver.com\\1\"",$incpage);
} while ($incpage != ${"incpage".$i});
echo $incpage;
}; //end of the first IF

?>
0
 
LVL 27

Expert Comment

by:caterham_www
ID: 12339617
And how does the String $testsite look like? Can you post an example for testing?
0
 

Author Comment

by:shotokai
ID: 12340616
In trying to post the content of the variable $incpage, I can see what is wrong - I think.  The source content of the page is not loading into the variable as I had thought.  Instead when I use include() or require(), the page source is output inline there and then.  The value of $incpage is 1.

Is there a function that will allow me to load the source of the remote page into a string?
0
 
LVL 27

Expert Comment

by:caterham_www
ID: 12341185
did you mean
$incpage = include($testsite);

I think if you have to get the source of a rempte page, you have to use fopen or fsockopen,

see http://www.php.net/fsockopen / http://www.php.net/fopen
But with the line
$domain = substr($domain,"0",$trimright);
you are destroying the complete string.
Ex. 'src="/john/sa.html" test src = "/edde/ggt.css' becomes
' ' (simply blank)

or
'src="http://john/sa.html" test src = "http://edde/ggt.css' becomes
src="john (/sa.html... is missing)
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 

Author Comment

by:shotokai
ID: 12363174
Ok, things are going much better now.

I needed to use the file_get_contents() function to load the source into the string.  PHP version was 4.1.1, and the function isn't available below 4.3, so I've been cursing and upgrading for a couple of days.  That is behind us now.

In the above.  $domain is a short string that holds the name of the site in the form of xxx.server.xxx.  It isn't what I'm manipulating further down the page ($incpage), which is populated through file_get_contents($testsite)

The two bits of code that caterham_www provided above are now working.  One issue.  In th second bit of code:

$i=0;
do {
     ${"incpage".$i}=$incpage;
     $incpage = eregi_replace ("a href\ ?=\ ?\"(.[^:]*)\"","a href=\"http://www.myserver.com/page.php?linkpath=http://www.theirserver.com\\1\"",$incpage);
} while ($incpage != ${"incpage".$i});
0
 

Author Comment

by:shotokai
ID: 12363361
---I got truncated some how

So if the link has anything between the 'a' and the 'href', the match isn't picked up.  I don't understand the matching syntax.  So if you can clear this up for me...
0
 
LVL 27

Accepted Solution

by:
caterham_www earned 500 total points
ID: 12364852
Hi,

try this one:

a .* href\ ?=\ ?\"(.[^:]*)\"
-->
$incpage = eregi_replace ("a .* href\ ?=\ ?\"(.[^:]*)\"","a href=\"http://www.myserver.com/page.php?linkpath=http://www.theirserver.com\\1\"",$incpage);

will match
a href=""
a target="" href="" etc.
If you would like to include e.g. target="" (the things between a and href into your replaced string:

$incpage = eregi_replace ("a (.*) href\ ?=\ ?\"(.[^:]*)\"","a \\1 href=\"http://www.myserver.com/page.php?linkpath=http://www.theirserver.com\\2\"",$incpage);

a target="as" href="/hhh" will become a target="as" href="http://www.myserver..."
0
 

Author Comment

by:shotokai
ID: 12369522
That worked well thanks.

There are still some instances where it it missing - but it isn't fair of me to ask you to deal with these.  I'm going to go ahead and accept your answer(s) - and thanks very much for yoru time!  can you point me towards an online resource that explains the syntax for the string matching and replacing as you've used it?

thanks again
0
 

Author Comment

by:shotokai
ID: 12370635
Hey - I'll save you the pain.  found some really good reference material on phpfreaks:

http://www.phpfreaks.com/tutorials/63/3.php (specific to eregi_replace and the ERE POSIX syntax)

Thanks again
0
 
LVL 27

Expert Comment

by:caterham_www
ID: 12372831
Hi,

thanks. Also interesting about regular expressions is this site: http://www.regular-expressions.info
It's not about eregi_replace but about the RegEx, which is of course the pattern-part for eregi_replace.
0

Featured Post

Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

Join & Write a Comment

This article will explain how to display the first page of your Microsoft Word documents (e.g. .doc, .docx, etc...) as images in a web page programatically. I have scoured the web on a way to do this unsuccessfully. The goal is to produce something …
Generating table dynamically is the most common issue faced by php developers.... So it seems there is a need of an article that explains the basic concept of generating tables dynamically. It just requires a basic knowledge of html and little maths…
The viewer will learn how to count occurrences of each item in an array.
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now