trevor1940
asked on
perl: Cleaning meta tags using RegEX
I need to clean the mata tags post building a html page by removing the forward slash at end of the line
I cannot use a global replace =~s{\/>}{>} as <hr /> and <src="example.com" /> are both valid so need to ensure i'm only removing from the end of the outputted meta tag
this Outputs
I need this
I cannot use a global replace =~s{\/>}{>} as <hr /> and <src="example.com" /> are both valid so need to ensure i'm only removing from the end of the outputted meta tag
#!/usr/bin/perl
use strict; use warnings;
use HTML::TreeBuilder;
use HTML::Element;
my $body =HTML::TreeBuilder->new_from_file(*DATA);
#print $body->as_HTML('<>&',' ',{}) . "\n";
my %meta= (
"Author"=>"J K Rolling",
"title","Harry Potter and the Philosopher's Stone"
);
my $head = $body -> find_by_tag_name('_tag', 'head');
for my $m (sort keys %meta)
{
my $m_el = HTML::Element->new('meta');
# keep name content in correct order
$m_el->attr('0name',$m);
$m_el->attr('1content',$meta{$m});
$head->push_content($m_el);
}
my $CloneOut = $body->as_HTML('<>&',' ',{});
# clean up / remove 0 & 1
$CloneOut =~ s/0name/name/ig;
$CloneOut =~ s/1content/content/ig;
while(<$CloneOut>){ ## Errors here on test script readline() on unopened filehandle
my $line = $1;
if ($line =~ m/meta/i){
$line =~ s{\"\s+\/>}{\">}; ## is this correct?
}
print $line;
}
__DATA__
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Hello World</title>
</head>
<body>
<h1> Books by J K Rolling</h1>
</body>
</html>
this Outputs
<meta name="Author" content="J K Rolling" />
I need this
<meta name="Author" content="J K Rolling">
Maybe try
$y = "<meta name="Author" content="J K Rolling" />";
$y =~ s/\s*\/(>)/$1/;
ASKER
Your regex seems to work however I'm still getting this error
$CloneOut isn't a filehandle it's a scaler so how do I ensure i'm only changing the meta data and not html body?
while(<$CloneOut>){ ## Errors here readline() on unopened filehandle
$CloneOut isn't a filehandle it's a scaler so how do I ensure i'm only changing the meta data and not html body?
Sorry can't help further Perl not my speciality
Why are you using the diamond operator and why are you using a while loop?
If you remove the diamond operator, that will fix the "readline() on unopened filehandle" error; then you'll need to fix the infinite loop that your while loop creates.
If you remove the diamond operator, that will fix the "readline() on unopened filehandle" error; then you'll need to fix the infinite loop that your while loop creates.
ASKER
fix the infinite loop that your while loop creates
How do I do that?
You first need to ask yourself why you are using a loop.
$CloneOut is a scalar which holds a string of html and when used in the while condition, you're testing for truthfulness and since it never changes, it will always evaluate to true and becomes an infinite loop.
Instead of the loop, you could simply apply the regex to the scalar (making sure you use the g modifier). If you want to use a loop, then you need to split the string into separate lines (i.e. turn it into an array or list) and loop over each of them.
$CloneOut is a scalar which holds a string of html and when used in the while condition, you're testing for truthfulness and since it never changes, it will always evaluate to true and becomes an infinite loop.
Instead of the loop, you could simply apply the regex to the scalar (making sure you use the g modifier). If you want to use a loop, then you need to split the string into separate lines (i.e. turn it into an array or list) and loop over each of them.
ASKER
Instead of the loop, you could simply apply the regex to the scalar
$CloneOut =~ s/\s*\/(>)/$1/g;
doesn't work because
<hr /> and <img src="mypic.jpg" /> are both valid
I'm guessing
split the string into separate lines (i.e. turn it into an array or list) and loop over each of them.I'd do something like this?
my @CloneOut = split /$/m, $CloneOut;
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thank You for your help
You're welcome, glad I was able to help.
pls try
Open in new window
Regards