r_kar
asked on
Get body part
Hi all,
I would like to read a html file and only body part should store in a variable(without html tags) . How can i get the soln?
Here is the html:
<HTML><HEAD><TITLE>Mathema tics Enrichment Workshop: The Perfect Number Journey</TITLE>
<BODY bgColor=white>
<CENTER>
<P>Mathematicians and nonmathematicians have been fascinated for centuries by
the properties and patterns of numbers. They have noticed that some numbers are
equal to the sum of all of their <FONT color=red><B>factors</B></ FONT> (not
including the number itself). The smallest such example is <FONT
color=red>6</FONT>, since <FONT color=red>6</FONT> = 1 + 2 + 3. Such numbers are
called <FONT color=red><B>perfect numbers</B></FONT>.
<P>The search for perfect numbers began in ancient times. The first three
perfect numbers: <FONT color=red>6</FONT>, <FONT color=red>28</FONT> and <FONT
color=red>496</FONT> were known to the ancient mathematicians since the time of
Pythagoras (circa 500 BC).
<P>
</BODY></HTML>
Thanks
I would like to read a html file and only body part should store in a variable(without html tags) . How can i get the soln?
Here is the html:
<HTML><HEAD><TITLE>Mathema
<BODY bgColor=white>
<CENTER>
<P>Mathematicians and nonmathematicians have been fascinated for centuries by
the properties and patterns of numbers. They have noticed that some numbers are
equal to the sum of all of their <FONT color=red><B>factors</B></
including the number itself). The smallest such example is <FONT
color=red>6</FONT>, since <FONT color=red>6</FONT> = 1 + 2 + 3. Such numbers are
called <FONT color=red><B>perfect numbers</B></FONT>.
<P>The search for perfect numbers began in ancient times. The first three
perfect numbers: <FONT color=red>6</FONT>, <FONT color=red>28</FONT> and <FONT
color=red>496</FONT> were known to the ancient mathematicians since the time of
Pythagoras (circa 500 BC).
<P>
</BODY></HTML>
Thanks
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
You could also use HTML::Parser.
package MyParser;
use HTML::Parser;
@ISA = ("HTML::Parser");
sub start()
{
$inbody = 1 if $_[1] eq "body";
}
sub end()
{
$inbody = 0 if $_[1] eq "body";
}
sub text()
{
$body .= $_[1] if $inbody
}
sub GetBodyFromFile()
{
my ($self, $filename) = @_;
undef $body;
$self->parse_file($filenam e);
return $body;
}
package main;
my $p = MyParser->new();
print $p->GetBodyFromFile("examp le.html");
package MyParser;
use HTML::Parser;
@ISA = ("HTML::Parser");
sub start()
{
$inbody = 1 if $_[1] eq "body";
}
sub end()
{
$inbody = 0 if $_[1] eq "body";
}
sub text()
{
$body .= $_[1] if $inbody
}
sub GetBodyFromFile()
{
my ($self, $filename) = @_;
undef $body;
$self->parse_file($filenam
return $body;
}
package main;
my $p = MyParser->new();
print $p->GetBodyFromFile("examp
r_kar,
Did u get the solution?
Please let us know..
Aman.
Did u get the solution?
Please let us know..
Aman.
ASKER
amandeep,
Excellent performance.
past one week i won't be able to login.
Sorry for the delay.
Thanks
Excellent performance.
past one week i won't be able to login.
Sorry for the delay.
Thanks
r_kar,
I am glad I could help u and u got a working solution.
I think everyone was facing the login problem at EE for past one week.
Cheers,
Aman
:-)
I am glad I could help u and u got a working solution.
I think everyone was facing the login problem at EE for past one week.
Cheers,
Aman
:-)
Here is what u can try :
$htmlFile="/path/to/html/f
undef $/;
#Open the html file
open(HTML,$htmlFile) or die "cannot open html File $! \n";
#read the entire file(Slurp)
$content = <TMPL>;
#close the file
close(TMPL);
$/="\n";
#extract the body part
$content =~ /<body[^>]*>(.*?)<\/body[^
$body=$1;
#remove other html tags from the body
$body=~s/<\/?[^>]+>//g;
print $body;
Hope this helps. please let me know.
Aman
:-)