We help IT Professionals succeed at work.

# Get body part

on
Hi all,
I would like to read a html file and only body part should store in a variable(without html tags) . How can i get the soln?
Here is the html:
<HTML><HEAD><TITLE>Mathematics Enrichment Workshop: The Perfect Number Journey</TITLE>
<BODY bgColor=white>
<CENTER>

<P>Mathematicians and nonmathematicians have been fascinated for centuries by
the properties and patterns of numbers. They have noticed that some numbers are
equal to the sum of all of their <FONT color=red><B>factors</B></FONT> (not
including the number itself). The smallest such example is <FONT
color=red>6</FONT>, since <FONT color=red>6</FONT> = 1 + 2 + 3. Such numbers are
called <FONT color=red><B>perfect numbers</B></FONT>.
<P>The search for perfect numbers began in ancient times. The first three
perfect numbers: <FONT color=red>6</FONT>, <FONT color=red>28</FONT> and <FONT
color=red>496</FONT> were known to the ancient mathematicians since the time of
Pythagoras (circa 500 BC).
<P>
</BODY></HTML>

Thanks
Comment
Watch Question

## View Solution Only

Commented:
r_kar,

Here is what u can try :

\$htmlFile="/path/to/html/file.html";
undef \$/;
#Open the html file
open(HTML,\$htmlFile) or die "cannot open html File \$! \n";
\$content = <TMPL>;
#close the file
close(TMPL);
\$/="\n";
#extract the body part
\$content =~ /<body[^>]*>(.*?)<\/body[^>]*>/ims;
\$body=\$1;
#remove other html tags from the body
\$body=~s/<\/?[^>]+>//g;

print \$body;

Hope this helps. please let me know.

Aman
:-)

Commented:
r_kar,

Here is what u can try :

\$htmlFile="/path/to/html/file.html";
undef \$/;
#Open the html file
open(HTML,\$htmlFile) or die "cannot open html File \$! \n";
\$content = <HTML>;
#close the file
close(HTML);
\$/="\n";
#extract the body part
\$content =~ /<body[^>]*>(.*?)<\/body[^>]*>/ims;
\$body=\$1;
#remove other html tags from the body
\$body=~s/<\/?[^>]+>//g;

print \$body;

Hope this helps. please let me know.

Aman
:-)

Commented:
You could also use HTML::Parser.

package MyParser;
use HTML::Parser;
@ISA = ("HTML::Parser");
sub start()
{
\$inbody = 1 if \$_[1] eq "body";
}

sub end()
{
\$inbody = 0 if \$_[1] eq "body";
}

sub text()
{
\$body .= \$_[1] if \$inbody
}

sub GetBodyFromFile()
{
my (\$self, \$filename) = @_;
undef \$body;
\$self->parse_file(\$filename);
return \$body;
}

package main;

my \$p = MyParser->new();
print \$p->GetBodyFromFile("example.html");

Commented:
r_kar,

Did u get the solution?

Aman.

Commented:
amandeep,
Excellent performance.
past one week i won't be able to login.
Sorry for the delay.
Thanks

Commented:
r_kar,

I am glad I could help u and u got a working solution.

I think everyone was facing the login problem at EE for past one week.

Cheers,
Aman
:-)