?
Solved

Parse through HTML to find and replace blocks

Posted on 2006-11-28
13
Medium Priority
?
333 Views
Last Modified: 2012-06-27
I have a block of HTML, shown below:

$theWebPage <<< WEBPAGE

<html><head><title>The Page</title></head>
<body>
<h2>A Title</h2>
<img alt="ZCMS|SCRIPTURE|1COL1ROW" src="./scripture.gif" style="left: 507px; position: absolute; top: 85px;" />
<p><strong>Some descriptive text</strong></p>
<img src="this.gif" alt="An image" />
<p>$99.99</p>
<img alt="ZCMS|NEWS|3COL1ROW" src="./news.gif" style="left: 18px; position: absolute; top: 127px;" />
</body>
</html>
WEBPAGE;

I want to parse through the $theWebPage string with PHP and locate all instances of <img> tags that have the special alt="ZCMS|..." substring, and replace it in its entirety with something else. Note, however, that this alt tag can have multiple values, depending on the type of content being replaced, and the style attributes (left and top) will invariably be different for each.  The one constant amongst images to be replaced is the alt="ZCMS| portion...

Thank you!
0
Comment
Question by:godrifle
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 5
  • 2
  • +1
13 Comments
 
LVL 3

Expert Comment

by:jmsloan
ID: 18031386
$newwebpage = str_replace("$oldtext", "$newtext", "$theWebPage");
0
 

Author Comment

by:godrifle
ID: 18031412
Thanks jmsloan. I have no way of knowing precisely what the complete needle string is. Everything in the alt attribute can (and will) vary except the "ZCMS|" portion, and the contents of the style attribute will also change. Therefore, I don't see how using str_replace will work....
0
 
LVL 29

Expert Comment

by:TeRReF
ID: 18031638
<?php

$theWebPage = '
<html><head><title>The Page</title></head>
<body>
<h2>A Title</h2>
<img alt="ZCMS|SCRIPTURE|1COL1ROW" src="./scripture.gif" style="left: 507px; position: absolute; top: 85px;" />
<p><strong>Some descriptive text</strong></p>
<img src="this.gif" alt="An image" />
<p>$99.99</p>
<img alt="ZCMS|NEWS|3COL1ROW" src="./news.gif" style="left: 18px; position: absolute; top: 127px;" />
</body>
</html>
';

$replacement = 'Blah';
$theWebPage = preg_replace('/alt="ZCMS/', 'alt="'.$replacement, $theWebPage);
print $theWebPage;

?>
0
Are You Using the Best Web Development Editor?

The worlds of web hosting and web development are constantly evolving. Every year we see design trends change, coding standards adapt and new frameworks/CMS created. With such a quick pace of change it’s easy to get lost trying to keep up.

See if your editor made the list.

 

Author Comment

by:godrifle
ID: 18031911
Thanks TeRReF. I think I see where my original question is falling short in describing what I'm looking for.

I want to parse through the $theWebPage string with PHP and locate all instances of <img> tags that have the special alt="ZCMS|..." substring, and replace THE ENTIRE <IMG> TAG STRING with something else.

In a nutshell, I want to search for all that match the initial <img alt="ZCMS| portion, and ignore all image tags that don't have that match...

Most likely this is a regex exercise, which I'm very poor at.

This pseudo-code may explain it better:

foreach(image tag that contains the substring "<img alt="ZCMS|" in $theWebPage){
    perform replacement of the entire image tag
}
0
 
LVL 3

Expert Comment

by:jmsloan
ID: 18031967
Here this may work for you.  Test is out

<?php

function ChangeText($var){

   $text1 = "ZCMS|SCRIPTURE|1COL1ROW";
   $text2 = "ZCMS|NEWS|3COL1ROW";

   $tmpval = substr_count($var, 'ZCMS|');
   if($tmpval > 0){
      for($i=1;$i<=$tmpval;$i++){
         $pos1 = strpos($var, 'ZCMS|');    
         $var2 = substr($var, $pos1);  
         $pos2 = strpos($var2, '"');    
         $str = substr($var, $pos1, $pos2);

         if($str == "$text1"){
            $var = str_replace("$text1", "replace with this1", "$var");
         }else if($str == "$text2"){
            $var = str_replace("$text2", "replace with this2", "$var");
         }
      }
   }

return $var;

}

$html = '<html><head><title>The Page</title></head>
         <body>
         <h2>A Title</h2>
         <img alt="ZCMS|SCRIPTURE|1COL1ROW" src="./scripture.gif" style="left: 507px; position:
            absolute; top: 85px;" />
         <p><strong>Some descriptive text</strong></p>
         <img src="this.gif" alt="An image" />
         <p>$99.99</p>
         <img alt="ZCMS|NEWS|3COL1ROW" src="./news.gif" style="left: 18px; position: absolute; top: 127px;" />
         </body>
         </html>';

$newhtml = ChangeText("$html");

echo "$newhtml";

?>
0
 

Author Comment

by:godrifle
ID: 18032139
Thanks jmsloan and others. I suspect I posted my last comment while you were composing your reply. Please check it out as it sheds better light on why your solution doesn't work.

Once the darn <img> tags are caught, I intend to parse through each one and take actions based upon position and the other values inside the alt attribute. I'm nearly positive this is a regex exercise, but I don't know how to cope with that! ;-)
0
 
LVL 9

Expert Comment

by:tolgaong
ID: 18035278
<?php
$theWebPage = <<< WEBPAGE
<html><head><title>The Page</title></head>
<body>
<h2>A Title</h2>
<img alt="ZCMS|SCRIPTURE|1COL1ROW" src="./scripture.gif" style="left: 507px; position: absolute; top: 85px;" />
<p><strong>Some descriptive text</strong></p>
<img src="this.gif" alt="An image" />
<p>$99.99</p>
<img alt="ZCMS|NEWS|3COL1ROW" src="./news.gif" style="left: 18px; position: absolute; top: 127px;" />
</body>
</html>
WEBPAGE;
$theWebPage = preg_replace_callback('/([\w]+)\\|([\w]+)\\|([\w\d]+)/', 'changeAlt', $theWebPage);
echo $theWebPage;

function changeAlt($key){
      //mysql_query("select * from table where field1='{$key[1]}' and field2='{$key[2]}' field3='{$key[3]}'");
      // other method;
      
      //$array[$key[1]][$key[2]][$key[3]];
      $array=array();
      $array["ZCMS"]["SCRIPTURE"]["1COL1ROW"]="data1";
      $array["ZCMS"]["NEWS"]["3COL1ROW"]="data2";
      $data=$array[$key[1]][$key[2]][$key[3]];
      
      return $data;
      }
?>
0
 
LVL 29

Expert Comment

by:TeRReF
ID: 18035331
How about this, it looks for image tag with the ZCMS string and stores them in the $matches array. It will also store the offset, so you know where the match occured in the string. Now you can use the $matches array to replace them to whatever you want...

<?php

$theWebPage = '
<html><head><title>The Page</title></head>
<body>
<h2>A Title</h2>
<img alt="ZCMS|SCRIPTURE|1COL1ROW" src="./scripture.gif" style="left: 507px; position: absolute; top: 85px;" />
<p><strong>Some descriptive text</strong></p>
<img src="this.gif" alt="An image" />
<p>$99.99</p>
<img alt="ZCMS|NEWS|3COL1ROW" src="./news.gif" style="left: 18px; position: absolute; top: 127px;" />
</body>
</html>
';

preg_match_all('/<img.*?alt="ZCMS[^>]+>/', $theWebPage, $matches, PREG_OFFSET_CAPTURE);
print_r($matches);

?>
0
 
LVL 29

Accepted Solution

by:
TeRReF earned 240 total points
ID: 18035402
Of course, if you want to replace all matches with the same value, you can just do this:

<?php

$theWebPage = '
<html><head><title>The Page</title></head>
<body>
<h2>A Title</h2>
<img alt="ZCMS|SCRIPTURE|1COL1ROW" src="./scripture.gif" style="left: 507px; position: absolute; top: 85px;" />
<p><strong>Some descriptive text</strong></p>
<img src="this.gif" alt="An image" />
<p>$99.99</p>
<img alt="ZCMS|NEWS|3COL1ROW" src="./news.gif" style="left: 18px; position: absolute; top: 127px;" />
</body>
</html>
';

$replacement = 'blah';
$theWebPage = preg_replace('/<img.*?alt="ZCMS[^>]+>/', $replacement, $theWebPage);
print($theWebPage);

?>
0
 

Author Comment

by:godrifle
ID: 18037109
TeRReF, You're right on it. Thank you. One last question before I reward you the points. Do you see any flaws in the following pattern to match the entire style="..." string:

preg_match('/style=".*?[^"]+"/'

Thanks so very much for pointing me in the right direction!
0
 
LVL 29

Expert Comment

by:TeRReF
ID: 18037571
This is better:
preg_match('/style="[^"]+"/'

since this will already match any character that's not a "
[^"]+
0
 

Author Comment

by:godrifle
ID: 18039001
Thank you TeRReF. You've been a great help! And thank you to everyone else who commented.
0
 
LVL 29

Expert Comment

by:TeRReF
ID: 18039061
You're welcome :)
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Build an array called $myWeek which will hold the array elements Today, Yesterday and then builds up the rest of the week by the name of the day going back 1 week.   (CODE) (CODE) Then you just need to pass your date to the function. If i…
There are times when I have encountered the need to decompress a response from a PHP request. This is how it's done, but you must have control of the request and you can set the Accept-Encoding header.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
Suggested Courses

764 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question