We help IT Professionals succeed at work.

Get body part

r_kar
r_kar asked
on
Hi all,
I would like to read a html file and only body part should store in a variable(without html tags) . How can i get the soln?
Here is the html:
<HTML><HEAD><TITLE>Mathematics Enrichment Workshop: The Perfect Number Journey</TITLE>
<BODY bgColor=white>
<CENTER>

<P>Mathematicians and nonmathematicians have been fascinated for centuries by
the properties and patterns of numbers. They have noticed that some numbers are
equal to the sum of all of their <FONT color=red><B>factors</B></FONT> (not
including the number itself). The smallest such example is <FONT
color=red>6</FONT>, since <FONT color=red>6</FONT> = 1 + 2 + 3. Such numbers are
called <FONT color=red><B>perfect numbers</B></FONT>.
<P>The search for perfect numbers began in ancient times. The first three
perfect numbers: <FONT color=red>6</FONT>, <FONT color=red>28</FONT> and <FONT
color=red>496</FONT> were known to the ancient mathematicians since the time of
Pythagoras (circa 500 BC).
<P>
</BODY></HTML>

Thanks
Comment
Watch Question

Commented:
r_kar,

Here is what u can try :

$htmlFile="/path/to/html/file.html";
undef $/;
#Open the html file
open(HTML,$htmlFile) or die "cannot open html File $! \n";
#read the entire file(Slurp)
$content = <TMPL>;
#close the file
close(TMPL);
$/="\n";
#extract the body part
$content =~ /<body[^>]*>(.*?)<\/body[^>]*>/ims;
$body=$1;
#remove other html tags from the body
$body=~s/<\/?[^>]+>//g;

print $body;


Hope this helps. please let me know.

Aman
:-)


Commented:
r_kar,

Here is what u can try :

$htmlFile="/path/to/html/file.html";
undef $/;
#Open the html file
open(HTML,$htmlFile) or die "cannot open html File $! \n";
#read the entire file(Slurp)
$content = <HTML>;
#close the file
close(HTML);
$/="\n";
#extract the body part
$content =~ /<body[^>]*>(.*?)<\/body[^>]*>/ims;
$body=$1;
#remove other html tags from the body
$body=~s/<\/?[^>]+>//g;

print $body;


Hope this helps. please let me know.

Aman
:-)
You could also use HTML::Parser.

package MyParser;
use HTML::Parser;
@ISA = ("HTML::Parser");
sub start()
{
  $inbody = 1 if $_[1] eq "body";
}

sub end()
{
  $inbody = 0 if $_[1] eq "body";
}

sub text()
{
  $body .= $_[1] if $inbody
}

sub GetBodyFromFile()
{
  my ($self, $filename) = @_;
  undef $body;
  $self->parse_file($filename);
  return $body;
}  

package main;

my $p = MyParser->new();
print $p->GetBodyFromFile("example.html");
 

Commented:
r_kar,

Did u get the solution?

Please let us know..

Aman.

Author

Commented:
amandeep,
Excellent performance.
past one week i won't be able to login.
Sorry for the delay.
Thanks

Commented:
r_kar,

I am glad I could help u and u got a working solution.

I think everyone was facing the login problem at EE for past one week.

Cheers,
Aman
:-)

Explore More ContentExplore courses, solutions, and other research materials related to this topic.