parse link and get the text for it from a table

here is the html file i've got:  it has a table with 2 columns and multiple rows.

<table>

<tr>
<td>
<b>Yahoo</b>
</td>

<td>
<a href="http://www.yahoo.com">here</a>
</td>

</tr>

<tr>
<td>
<b>Hotmail</b>
</td>
<td>
<a href="http://www.hotmail.com">here</a>
</td>

</tr>
....

</table>

=========
I want to extract the TEXT along with the link.  I used LinkExtractor package.  But as you can see the Text associated with the links are not particular descriptive.  

Is there another way so that the Text in the first column be extracted and associated with the link?

crestAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

rj2Commented:
#!/usr/bin/perl

$html=<<ENDOFHTML;
<html>
<body>

<table>

<tr>
<td>
<b>Yahoo</b>
</td>

<td>
<a href="http://www.yahoo.com">here</a>
</td>

</tr>

<tr>
<td>
<b>Hotmail</b>
</td>
<td>
<a href="http://www.hotmail.com">here</a>
</td>

</tr>


</table>
</body>
</html>
ENDOFHTML

while($html=~m!<td>\s*(?:<b>)?(\w*)(?:</b>)?\s*</td>\s*<td>\s*<a href="([^">]*)!ig) {
     print $1,",",$2,"\n";
}
0
ChrisDrakeCommented:
I think he means to find the links, not just parse that example.

remember that the <A> tag might not have quotes
<A href=http://hotmail.com>
<A href="http://hotmail.com">
<A href='http://hotmail.com'>

and might have targets

<A target=_blank href=http://hotmail.com>

and there might be other HTML in the text

<A href=http://hotmail.com>an <i>expensive</i> free webmail</a>

... so it's probably about 10 lines of perl to do properly.
0
unobservedCommented:
Use the HTML::TokeParser::Simple module from CPAN.

It's the easiest, more efficient and correct way to do it.
0
Cloud Class® Course: Ruby Fundamentals

This course will introduce you to Ruby, as well as teach you about classes, methods, variables, data structures, loops, enumerable methods, and finishing touches.

crestAuthor Commented:
thanks for all your comments.  yes i want a more robust solution.  unobserved suggestion of using HTML::TokeParser:Simple sounds good.  i even had a look at the source code of HTML::LinkExtractor which uses the TokeParser::Simple module.  but the oo just look too much for me...

it would be a great chance for me to learn more about perl's oo capability... there are so many module out there that use this methodology.  so would be useful.  but i need some guidance on how to go about getting text from different 'cell' to be related to a link in yet another cell. cheers.
0
rj2Commented:
#!/usr/bin/perl
use strict;
use HTML::TokeParser::Simple;

my $p = HTML::TokeParser::Simple->new('parsehtml.html');
my ($tdcount,$linktext,$savetext,$savelink,$testtext,@text);


while ( my $token = $p->get_token ) {
     if ( $token->is_start_tag('td')) {
          $tdcount++;
          if($tdcount % 2) {
               $savetext=1;
          } else {
               $savelink=1;
          }          
     }
     if ( $token->is_end_tag('td')) {
          $savetext=0;
          $savelink=0;
     }
     if($savetext && $token->is_text()) {
          $testtext=$token->as_is();
          chomp($testtext);
          if(length($testtext)) {    
               $linktext=$testtext;              
          }
         
     }
     if ($savelink && $token->is_start_tag('a')) {
          my $attr=$token->return_attr();
          push(@text,$linktext . ',' . $attr->{ href });                    
     }
}
foreach (@text) {
     print $_,"\n";
}    
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
crestAuthor Commented:
rj2, thanks very much for your answer.

your approach is simple and effective. i can adopt it to many different situations.

cheers
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Perl

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.