thaimin
asked on
Substituting occurences of a word only if they are not in a tag
I currenty have a program that takes an entire HTML document, and then replaces one word with another. The problem is if the word is the source of an image tag, the image source is no longer right. The translating line is currently:
$d =~ s/$lookfor/$replace/gim;
How could I make it so it won't replace it if it's in a tag, or between script, title, or other tag?
$d =~ s/$lookfor/$replace/gim;
How could I make it so it won't replace it if it's in a tag, or between script, title, or other tag?
You can actually use a library like HTML to parse it. That would be easier than trying to do one really complex regular expression. I have one code for that in some place. I just will post you that soon.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks, it seams to be working now, but if you could help me with a few problems that I'm having, it might be that I'm taking the wrong approach. Basically what I want to do is "highlight" all the same words in an HTML document. My script at the end is now:
my $tree = HTML::TreeBuilder->new();
$tree->parse($d); #HTML source is in $d
$tree->eof();
my $body = $tree->look_down('_tag', 'body');
for (my $i = 0; $i < scalar(@finalValues); $i++) {
$lookfor = $finalValues[$i];
$replace = "<b style=\"color:black;backgr ound-color :$refrence {$lookfor} \">$lookfo r</b>"; #This is the highlight
foreach my $item_r ($body->content_refs_list) {
next if ref $$item_r;
$$item_r =~ s/$lookfor/$replace/gim;
}
}
print $tree->as_HTML;
$tree = $tree->delete;
The problems I'm having are that as_HTML turns the <> into &..., I I would either need to make each ~text element a ~literal or split all the text apart an push the <B> tag in. The other problem is the content_refs_list only does imediate children, not children of children, so I only get the text that's right under the body.
Thanks for helping so far, and I would really like it if you could answers these too. Thanks again.
my $tree = HTML::TreeBuilder->new();
$tree->parse($d); #HTML source is in $d
$tree->eof();
my $body = $tree->look_down('_tag', 'body');
for (my $i = 0; $i < scalar(@finalValues); $i++) {
$lookfor = $finalValues[$i];
$replace = "<b style=\"color:black;backgr
foreach my $item_r ($body->content_refs_list)
next if ref $$item_r;
$$item_r =~ s/$lookfor/$replace/gim;
}
}
print $tree->as_HTML;
$tree = $tree->delete;
The problems I'm having are that as_HTML turns the <> into &..., I I would either need to make each ~text element a ~literal or split all the text apart an push the <B> tag in. The other problem is the content_refs_list only does imediate children, not children of children, so I only get the text that's right under the body.
Thanks for helping so far, and I would really like it if you could answers these too. Thanks again.
ASKER
Actually, I foudn out away to turn all the ~text objects into ~literal objects and it worked well:
$tree->objectify_text();
my @texts = $body->look_down("_tag","~ text");
for (my $i = 0; $i < scalar(@texts); $i++) {
$texts[$i]->tag('~literal' );
}
But I would still appreciate it if you could help with searching the grandchildren. Thanks a lot.
$tree->objectify_text();
my @texts = $body->look_down("_tag","~
for (my $i = 0; $i < scalar(@texts); $i++) {
$texts[$i]->tag('~literal'
}
But I would still appreciate it if you could help with searching the grandchildren. Thanks a lot.
ASKER
This really worked out, thanks a lot. If you could help me answer one question I have about HTML::Element now, there are more points.
It's at: https://www.experts-exchange.com/questions/20403280/A-way-to-use-content-refs-list-to-get-all-children-and-grandchildren.html
It's at: https://www.experts-exchange.com/questions/20403280/A-way-to-use-content-refs-list-to-get-all-children-and-grandchildren.html