Link to home
Start Free TrialLog in
Avatar of r_kar
r_kar

asked on

Get body part

Hi all,
I would like to read a html file and only body part should store in a variable(without html tags) . How can i get the soln?
Here is the html:
<HTML><HEAD><TITLE>Mathematics Enrichment Workshop: The Perfect Number Journey</TITLE>
<BODY bgColor=white>
<CENTER>

<P>Mathematicians and nonmathematicians have been fascinated for centuries by
the properties and patterns of numbers. They have noticed that some numbers are
equal to the sum of all of their <FONT color=red><B>factors</B></FONT> (not
including the number itself). The smallest such example is <FONT
color=red>6</FONT>, since <FONT color=red>6</FONT> = 1 + 2 + 3. Such numbers are
called <FONT color=red><B>perfect numbers</B></FONT>.
<P>The search for perfect numbers began in ancient times. The first three
perfect numbers: <FONT color=red>6</FONT>, <FONT color=red>28</FONT> and <FONT
color=red>496</FONT> were known to the ancient mathematicians since the time of
Pythagoras (circa 500 BC).
<P>
</BODY></HTML>

Thanks
Avatar of amandeep
amandeep
Flag of United Kingdom of Great Britain and Northern Ireland image

r_kar,

Here is what u can try :

$htmlFile="/path/to/html/file.html";
undef $/;
#Open the html file
open(HTML,$htmlFile) or die "cannot open html File $! \n";
#read the entire file(Slurp)
$content = <TMPL>;
#close the file
close(TMPL);
$/="\n";
#extract the body part
$content =~ /<body[^>]*>(.*?)<\/body[^>]*>/ims;
$body=$1;
#remove other html tags from the body
$body=~s/<\/?[^>]+>//g;

print $body;


Hope this helps. please let me know.

Aman
:-)


ASKER CERTIFIED SOLUTION
Avatar of amandeep
amandeep
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of clockwatcher
clockwatcher

You could also use HTML::Parser.

package MyParser;
use HTML::Parser;
@ISA = ("HTML::Parser");
sub start()
{
  $inbody = 1 if $_[1] eq "body";
}

sub end()
{
  $inbody = 0 if $_[1] eq "body";
}

sub text()
{
  $body .= $_[1] if $inbody
}

sub GetBodyFromFile()
{
  my ($self, $filename) = @_;
  undef $body;
  $self->parse_file($filename);
  return $body;
}  

package main;

my $p = MyParser->new();
print $p->GetBodyFromFile("example.html");
 
r_kar,

Did u get the solution?

Please let us know..

Aman.
Avatar of r_kar

ASKER

amandeep,
Excellent performance.
past one week i won't be able to login.
Sorry for the delay.
Thanks
r_kar,

I am glad I could help u and u got a working solution.

I think everyone was facing the login problem at EE for past one week.

Cheers,
Aman
:-)