asked on

Get body part

Hi all,
I would like to read a html file and only body part should store in a variable(without html tags) . How can i get the soln?
Here is the html:
<HTML><HEAD><TITLE>Mathematics Enrichment Workshop: The Perfect Number Journey</TITLE>
<BODY bgColor=white>
<CENTER>

Mathematicians and nonmathematicians have been fascinated for centuries by
the properties and patterns of numbers. They have noticed that some numbers are
equal to the sum of all of their factors (not
including the number itself). The smallest such example is 6, since 6 = 1 + 2 + 3. Such numbers are
called perfect numbers.
The search for perfect numbers began in ancient times. The first three
perfect numbers: 6, 28 and 496 were known to the ancient mathematicians since the time of
Pythagoras (circa 500 BC).

</BODY></HTML>

Thanks

amandeep

r_kar,

Here is what u can try :

$htmlFile="/path/to/html/file.html";
undef $/;
#Open the html file
open(HTML,$htmlFile) or die "cannot open html File $! \n";
#read the entire file(Slurp)
$content = <TMPL>;
#close the file
close(TMPL);
$/="\n";
#extract the body part
$content =~ /<body[^>]*>(.*?)<\/body[^>]*>/ims;
$body=$1;
#remove other html tags from the body
$body=~s/<\/?[^>]+>//g;

print $body;

Hope this helps. please let me know.

Aman
:-)

ASKER CERTIFIED SOLUTION

amandeep

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

clockwatcher

You could also use HTML::Parser.

package MyParser;
use HTML::Parser;
@ISA = ("HTML::Parser");
sub start()
{
$inbody = 1 if $_[1] eq "body";
}

sub end()
{
$inbody = 0 if $_[1] eq "body";
}

sub text()
{
$body .= $_[1] if $inbody
}

sub GetBodyFromFile()
{
my ($self, $filename) = @_;
undef $body;
$self->parse_file($filename);
return $body;
}

package main;

my $p = MyParser->new();
print $p->GetBodyFromFile("example.html");

amandeep

r_kar,

Did u get the solution?

Please let us know..

Aman.

r_kar

ASKER

amandeep,
Excellent performance.
past one week i won't be able to login.
Sorry for the delay.
Thanks

amandeep

r_kar,

I am glad I could help u and u got a working solution.

I think everyone was facing the login problem at EE for past one week.

Cheers,
Aman
:-)