?
Solved

String Manipulation in PHP - working with HTML and paths

Posted on 2004-10-15
11
Medium Priority
?
270 Views
Last Modified: 2008-03-03
I need to do quite a lot of string manipuation in PHP for an application that we are setting up.  I am familiar with PHP, but I've not used the string functions much, and after pouring over some resources this afternoon, I decided that the experts would provide a quicker way for me to get up to speed.

The app is pulling entire web pages from other domains into a string and then delivering them inline as part of a locally hosted page.  Here is what I need to be able to do:

1)For images, stylesheet references, .js files, et all, I need to take all the non-explicit links (there is a proper term for this which escapes me..) and make them explicit. e.g.
               src="/fred/index.php"
becomes: src="http://www.theirserver.com/fred/index.php"

2) manipulate the string so that links like <a href="/fred/index.php> click me </a>
becomes: <a href="http://www.myserver.com/page.php?linkpath=http://www.theirserver.com/fred/index.php> click me </a>
(the links get handled by the local app, and then the corrosponding remote page is loaded inline as part of a local page.)

3) I also need to accomodate in both of these scenarios for page relative links that will need to be converted to the full path, including the domain.

      of course the use of whitespace in the first two examples will vary as HTML allows ( can be src= " or src = " or src =" , et all)

4) Finally, (and I'll gladly put this up as a seperate question with points if I'm asking too much here for my 500...) I need to trace the user's click path as they move from page to page.  There requests will all be handled by the same PHP page on the local site (page.php in point 2 above), with the target page appering after the ?.  I'm assuming that I cannot store that data in an array, as the array would be re-dimensioned each time that page.php is loaded, so my preference would be to write the data to an xml document so that I can manipulate it with other tools for reporting purposes.  I would like to be able to transform the xml document into html for display on the site as well.


0
Comment
Question by:shotokai
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 5
11 Comments
 
LVL 27

Expert Comment

by:caterham_www
ID: 12328067
Hi,

for 1) + 3) try this:

<?
//counter-Variable
$i=0;
//your string
$var='hello world <src ="/fred/index.php" img src= "hello/test.php"> ';

do {
      ${"var".$i}=$var;
      $var = eregi_replace ("src\ ?=\ ?\"(.[^:]*)\"","src=\"http://www.theirserver.com\\1\"",$var);
} while ($var != ${"var".$i})

?>

and for 2) + 3)

<?
//counter-Variable
$i=0;
//your string
$var='<a href ="/fred/index.php">hello world</a> <a href= "/hello/test.php">hello</a>';
do {
      ${"var".$i}=$var;
      $var = eregi_replace ("a href\ ?=\ ?\"(.[^:]*)\"","a href=\"http://www.myserver.com/page.php?linkpath=http://www.theirserver.com\\1\"",$var);
} while ($var != ${"var".$i})
?>


Bob
0
 

Author Comment

by:shotokai
ID: 12339469
Thanks Bob:

this is kind of weird, but the eregi_replace isn't doing anything.  I used variables for the search and replace strings so that I could verify that nothing was happening.  Here is the code behind the page, including the part where the html content of the subject site gets pulled into a string:

<? if (! empty($testsite)) {
 
// first step is to determine the domain of the site being tested
//get rid of the http:// if it is used
$search = array("http://");
$replace = array("");
$domain = str_replace($search,$replace,$testsite);
//get rid of the trailing slash and anything there after
$trimright = strcspn($domain,"/");
$domain = substr($domain,"0",$trimright);
$domain = str_replace("/*","",$domain);
//$domain = substr($testsite, 7);
echo ($domain)."<br>";
$testsite = ("http://" . $domain) ;
echo ($testsite);
$incpage = include($testsite);

// next we run through the string (the html content of the test site) and
//      1) ammend the src references to include the full domain of hte site
$i=0;
do {
  //  ${"incpage".$i}=$incpage;
      $search = "a";
      //$search = "src\ ?=\ ?\"(.[^:]*)\""
      $replace= "apples";
      //$replace = "src=\"http://www.theirserver.com\\1\""
     $incpage = eregi_replace ($search,$replace,$incpage);
} while ($incpage != ${"incpage".$i});


//      2) next we need to change any link references to include the path to the usability application
$i=0;
do {
     ${"incpage".$i}=$incpage;
     $incpage = eregi_replace ("a href\ ?=\ ?\"(.[^:]*)\"","a href=\"http://www.myserver.com/page.php?linkpath=http://www.theirserver.com\\1\"",$incpage);
} while ($incpage != ${"incpage".$i});
echo $incpage;
}; //end of the first IF

?>
0
 
LVL 27

Expert Comment

by:caterham_www
ID: 12339617
And how does the String $testsite look like? Can you post an example for testing?
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 

Author Comment

by:shotokai
ID: 12340616
In trying to post the content of the variable $incpage, I can see what is wrong - I think.  The source content of the page is not loading into the variable as I had thought.  Instead when I use include() or require(), the page source is output inline there and then.  The value of $incpage is 1.

Is there a function that will allow me to load the source of the remote page into a string?
0
 
LVL 27

Expert Comment

by:caterham_www
ID: 12341185
did you mean
$incpage = include($testsite);

I think if you have to get the source of a rempte page, you have to use fopen or fsockopen,

see http://www.php.net/fsockopen / http://www.php.net/fopen
But with the line
$domain = substr($domain,"0",$trimright);
you are destroying the complete string.
Ex. 'src="/john/sa.html" test src = "/edde/ggt.css' becomes
' ' (simply blank)

or
'src="http://john/sa.html" test src = "http://edde/ggt.css' becomes
src="john (/sa.html... is missing)
0
 

Author Comment

by:shotokai
ID: 12363174
Ok, things are going much better now.

I needed to use the file_get_contents() function to load the source into the string.  PHP version was 4.1.1, and the function isn't available below 4.3, so I've been cursing and upgrading for a couple of days.  That is behind us now.

In the above.  $domain is a short string that holds the name of the site in the form of xxx.server.xxx.  It isn't what I'm manipulating further down the page ($incpage), which is populated through file_get_contents($testsite)

The two bits of code that caterham_www provided above are now working.  One issue.  In th second bit of code:

$i=0;
do {
     ${"incpage".$i}=$incpage;
     $incpage = eregi_replace ("a href\ ?=\ ?\"(.[^:]*)\"","a href=\"http://www.myserver.com/page.php?linkpath=http://www.theirserver.com\\1\"",$incpage);
} while ($incpage != ${"incpage".$i});
0
 

Author Comment

by:shotokai
ID: 12363361
---I got truncated some how

So if the link has anything between the 'a' and the 'href', the match isn't picked up.  I don't understand the matching syntax.  So if you can clear this up for me...
0
 
LVL 27

Accepted Solution

by:
caterham_www earned 2000 total points
ID: 12364852
Hi,

try this one:

a .* href\ ?=\ ?\"(.[^:]*)\"
-->
$incpage = eregi_replace ("a .* href\ ?=\ ?\"(.[^:]*)\"","a href=\"http://www.myserver.com/page.php?linkpath=http://www.theirserver.com\\1\"",$incpage);

will match
a href=""
a target="" href="" etc.
If you would like to include e.g. target="" (the things between a and href into your replaced string:

$incpage = eregi_replace ("a (.*) href\ ?=\ ?\"(.[^:]*)\"","a \\1 href=\"http://www.myserver.com/page.php?linkpath=http://www.theirserver.com\\2\"",$incpage);

a target="as" href="/hhh" will become a target="as" href="http://www.myserver..."
0
 

Author Comment

by:shotokai
ID: 12369522
That worked well thanks.

There are still some instances where it it missing - but it isn't fair of me to ask you to deal with these.  I'm going to go ahead and accept your answer(s) - and thanks very much for yoru time!  can you point me towards an online resource that explains the syntax for the string matching and replacing as you've used it?

thanks again
0
 

Author Comment

by:shotokai
ID: 12370635
Hey - I'll save you the pain.  found some really good reference material on phpfreaks:

http://www.phpfreaks.com/tutorials/63/3.php (specific to eregi_replace and the ERE POSIX syntax)

Thanks again
0
 
LVL 27

Expert Comment

by:caterham_www
ID: 12372831
Hi,

thanks. Also interesting about regular expressions is this site: http://www.regular-expressions.info
It's not about eregi_replace but about the RegEx, which is of course the pattern-part for eregi_replace.
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Foreword (July, 2015) Since I first wrote this article, years ago, a great many more people have begun using the internet.  They are coming online from every part of the globe, learning, reading, shopping and spending money at an ever-increasing ra…
Build an array called $myWeek which will hold the array elements Today, Yesterday and then builds up the rest of the week by the name of the day going back 1 week.   (CODE) (CODE) Then you just need to pass your date to the function. If i…
The viewer will learn how to dynamically set the form action using jQuery.
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
Suggested Courses

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question