godrifle
asked on
Parse through HTML to find and replace blocks
I have a block of HTML, shown below:
$theWebPage <<< WEBPAGE
<html><head><title>The Page</title></head>
<body>
<h2>A Title</h2>
<img alt="ZCMS|SCRIPTURE|1COL1R OW" src="./scripture.gif" style="left: 507px; position: absolute; top: 85px;" />
<p><strong>Some descriptive text</strong></p>
<img src="this.gif" alt="An image" />
<p>$99.99</p>
<img alt="ZCMS|NEWS|3COL1ROW" src="./news.gif" style="left: 18px; position: absolute; top: 127px;" />
</body>
</html>
WEBPAGE;
I want to parse through the $theWebPage string with PHP and locate all instances of <img> tags that have the special alt="ZCMS|..." substring, and replace it in its entirety with something else. Note, however, that this alt tag can have multiple values, depending on the type of content being replaced, and the style attributes (left and top) will invariably be different for each. The one constant amongst images to be replaced is the alt="ZCMS| portion...
Thank you!
$theWebPage <<< WEBPAGE
<html><head><title>The Page</title></head>
<body>
<h2>A Title</h2>
<img alt="ZCMS|SCRIPTURE|1COL1R
<p><strong>Some descriptive text</strong></p>
<img src="this.gif" alt="An image" />
<p>$99.99</p>
<img alt="ZCMS|NEWS|3COL1ROW" src="./news.gif" style="left: 18px; position: absolute; top: 127px;" />
</body>
</html>
WEBPAGE;
I want to parse through the $theWebPage string with PHP and locate all instances of <img> tags that have the special alt="ZCMS|..." substring, and replace it in its entirety with something else. Note, however, that this alt tag can have multiple values, depending on the type of content being replaced, and the style attributes (left and top) will invariably be different for each. The one constant amongst images to be replaced is the alt="ZCMS| portion...
Thank you!
$newwebpage = str_replace("$oldtext", "$newtext", "$theWebPage");
ASKER
Thanks jmsloan. I have no way of knowing precisely what the complete needle string is. Everything in the alt attribute can (and will) vary except the "ZCMS|" portion, and the contents of the style attribute will also change. Therefore, I don't see how using str_replace will work....
<?php
$theWebPage = '
<html><head><title>The Page</title></head>
<body>
<h2>A Title</h2>
<img alt="ZCMS|SCRIPTURE|1COL1R OW" src="./scripture.gif" style="left: 507px; position: absolute; top: 85px;" />
<p><strong>Some descriptive text</strong></p>
<img src="this.gif" alt="An image" />
<p>$99.99</p>
<img alt="ZCMS|NEWS|3COL1ROW" src="./news.gif" style="left: 18px; position: absolute; top: 127px;" />
</body>
</html>
';
$replacement = 'Blah';
$theWebPage = preg_replace('/alt="ZCMS/' , 'alt="'.$replacement, $theWebPage);
print $theWebPage;
?>
$theWebPage = '
<html><head><title>The Page</title></head>
<body>
<h2>A Title</h2>
<img alt="ZCMS|SCRIPTURE|1COL1R
<p><strong>Some descriptive text</strong></p>
<img src="this.gif" alt="An image" />
<p>$99.99</p>
<img alt="ZCMS|NEWS|3COL1ROW" src="./news.gif" style="left: 18px; position: absolute; top: 127px;" />
</body>
</html>
';
$replacement = 'Blah';
$theWebPage = preg_replace('/alt="ZCMS/'
print $theWebPage;
?>
ASKER
Thanks TeRReF. I think I see where my original question is falling short in describing what I'm looking for.
I want to parse through the $theWebPage string with PHP and locate all instances of <img> tags that have the special alt="ZCMS|..." substring, and replace THE ENTIRE <IMG> TAG STRING with something else.
In a nutshell, I want to search for all that match the initial <img alt="ZCMS| portion, and ignore all image tags that don't have that match...
Most likely this is a regex exercise, which I'm very poor at.
This pseudo-code may explain it better:
foreach(image tag that contains the substring "<img alt="ZCMS|" in $theWebPage){
perform replacement of the entire image tag
}
I want to parse through the $theWebPage string with PHP and locate all instances of <img> tags that have the special alt="ZCMS|..." substring, and replace THE ENTIRE <IMG> TAG STRING with something else.
In a nutshell, I want to search for all that match the initial <img alt="ZCMS| portion, and ignore all image tags that don't have that match...
Most likely this is a regex exercise, which I'm very poor at.
This pseudo-code may explain it better:
foreach(image tag that contains the substring "<img alt="ZCMS|" in $theWebPage){
perform replacement of the entire image tag
}
Here this may work for you. Test is out
<?php
function ChangeText($var){
$text1 = "ZCMS|SCRIPTURE|1COL1ROW";
$text2 = "ZCMS|NEWS|3COL1ROW";
$tmpval = substr_count($var, 'ZCMS|');
if($tmpval > 0){
for($i=1;$i<=$tmpval;$i++) {
$pos1 = strpos($var, 'ZCMS|');
$var2 = substr($var, $pos1);
$pos2 = strpos($var2, '"');
$str = substr($var, $pos1, $pos2);
if($str == "$text1"){
$var = str_replace("$text1", "replace with this1", "$var");
}else if($str == "$text2"){
$var = str_replace("$text2", "replace with this2", "$var");
}
}
}
return $var;
}
$html = '<html><head><title>The Page</title></head>
<body>
<h2>A Title</h2>
<img alt="ZCMS|SCRIPTURE|1COL1R OW" src="./scripture.gif" style="left: 507px; position:
absolute; top: 85px;" />
<p><strong>Some descriptive text</strong></p>
<img src="this.gif" alt="An image" />
<p>$99.99</p>
<img alt="ZCMS|NEWS|3COL1ROW" src="./news.gif" style="left: 18px; position: absolute; top: 127px;" />
</body>
</html>';
$newhtml = ChangeText("$html");
echo "$newhtml";
?>
<?php
function ChangeText($var){
$text1 = "ZCMS|SCRIPTURE|1COL1ROW";
$text2 = "ZCMS|NEWS|3COL1ROW";
$tmpval = substr_count($var, 'ZCMS|');
if($tmpval > 0){
for($i=1;$i<=$tmpval;$i++)
$pos1 = strpos($var, 'ZCMS|');
$var2 = substr($var, $pos1);
$pos2 = strpos($var2, '"');
$str = substr($var, $pos1, $pos2);
if($str == "$text1"){
$var = str_replace("$text1", "replace with this1", "$var");
}else if($str == "$text2"){
$var = str_replace("$text2", "replace with this2", "$var");
}
}
}
return $var;
}
$html = '<html><head><title>The Page</title></head>
<body>
<h2>A Title</h2>
<img alt="ZCMS|SCRIPTURE|1COL1R
absolute; top: 85px;" />
<p><strong>Some descriptive text</strong></p>
<img src="this.gif" alt="An image" />
<p>$99.99</p>
<img alt="ZCMS|NEWS|3COL1ROW" src="./news.gif" style="left: 18px; position: absolute; top: 127px;" />
</body>
</html>';
$newhtml = ChangeText("$html");
echo "$newhtml";
?>
ASKER
Thanks jmsloan and others. I suspect I posted my last comment while you were composing your reply. Please check it out as it sheds better light on why your solution doesn't work.
Once the darn <img> tags are caught, I intend to parse through each one and take actions based upon position and the other values inside the alt attribute. I'm nearly positive this is a regex exercise, but I don't know how to cope with that! ;-)
Once the darn <img> tags are caught, I intend to parse through each one and take actions based upon position and the other values inside the alt attribute. I'm nearly positive this is a regex exercise, but I don't know how to cope with that! ;-)
<?php
$theWebPage = <<< WEBPAGE
<html><head><title>The Page</title></head>
<body>
<h2>A Title</h2>
<img alt="ZCMS|SCRIPTURE|1COL1R OW" src="./scripture.gif" style="left: 507px; position: absolute; top: 85px;" />
<p><strong>Some descriptive text</strong></p>
<img src="this.gif" alt="An image" />
<p>$99.99</p>
<img alt="ZCMS|NEWS|3COL1ROW" src="./news.gif" style="left: 18px; position: absolute; top: 127px;" />
</body>
</html>
WEBPAGE;
$theWebPage = preg_replace_callback('/([ \w]+)\\|([ \w]+)\\|([ \w\d]+)/', 'changeAlt', $theWebPage);
echo $theWebPage;
function changeAlt($key){
//mysql_query("select * from table where field1='{$key[1]}' and field2='{$key[2]}' field3='{$key[3]}'");
// other method;
//$array[$key[1]][$key[2]] [$key[3]];
$array=array();
$array["ZCMS"]["SCRIPTURE" ]["1COL1RO W"]="data1 ";
$array["ZCMS"]["NEWS"]["3C OL1ROW"]=" data2";
$data=$array[$key[1]][$key [2]][$key[ 3]];
return $data;
}
?>
$theWebPage = <<< WEBPAGE
<html><head><title>The Page</title></head>
<body>
<h2>A Title</h2>
<img alt="ZCMS|SCRIPTURE|1COL1R
<p><strong>Some descriptive text</strong></p>
<img src="this.gif" alt="An image" />
<p>$99.99</p>
<img alt="ZCMS|NEWS|3COL1ROW" src="./news.gif" style="left: 18px; position: absolute; top: 127px;" />
</body>
</html>
WEBPAGE;
$theWebPage = preg_replace_callback('/([
echo $theWebPage;
function changeAlt($key){
//mysql_query("select * from table where field1='{$key[1]}' and field2='{$key[2]}' field3='{$key[3]}'");
// other method;
//$array[$key[1]][$key[2]]
$array=array();
$array["ZCMS"]["SCRIPTURE"
$array["ZCMS"]["NEWS"]["3C
$data=$array[$key[1]][$key
return $data;
}
?>
How about this, it looks for image tag with the ZCMS string and stores them in the $matches array. It will also store the offset, so you know where the match occured in the string. Now you can use the $matches array to replace them to whatever you want...
<?php
$theWebPage = '
<html><head><title>The Page</title></head>
<body>
<h2>A Title</h2>
<img alt="ZCMS|SCRIPTURE|1COL1R OW" src="./scripture.gif" style="left: 507px; position: absolute; top: 85px;" />
<p><strong>Some descriptive text</strong></p>
<img src="this.gif" alt="An image" />
<p>$99.99</p>
<img alt="ZCMS|NEWS|3COL1ROW" src="./news.gif" style="left: 18px; position: absolute; top: 127px;" />
</body>
</html>
';
preg_match_all('/<img.*?al t="ZCMS[^> ]+>/', $theWebPage, $matches, PREG_OFFSET_CAPTURE);
print_r($matches);
?>
<?php
$theWebPage = '
<html><head><title>The Page</title></head>
<body>
<h2>A Title</h2>
<img alt="ZCMS|SCRIPTURE|1COL1R
<p><strong>Some descriptive text</strong></p>
<img src="this.gif" alt="An image" />
<p>$99.99</p>
<img alt="ZCMS|NEWS|3COL1ROW" src="./news.gif" style="left: 18px; position: absolute; top: 127px;" />
</body>
</html>
';
preg_match_all('/<img.*?al
print_r($matches);
?>
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
TeRReF, You're right on it. Thank you. One last question before I reward you the points. Do you see any flaws in the following pattern to match the entire style="..." string:
preg_match('/style=".*?[^" ]+"/'
Thanks so very much for pointing me in the right direction!
preg_match('/style=".*?[^"
Thanks so very much for pointing me in the right direction!
This is better:
preg_match('/style="[^"]+" /'
since this will already match any character that's not a "
[^"]+
preg_match('/style="[^"]+"
since this will already match any character that's not a "
[^"]+
ASKER
Thank you TeRReF. You've been a great help! And thank you to everyone else who commented.
You're welcome :)