Parsing an html file

Lemoncurd
Lemoncurd used Ask the Experts™
on
I am new to perl, I am trying to parse an html file for a specifc link.

Basically i am trying to copy the cd cover images from amazon, so i need to parse the html page for the album and extract the link that points to the album photo, the relevant section of the html is:

<a href="image address">See larger photo</a>

where every page should contain this link, any ideas on the regexp to do this?

Also when i am searching in perl using m// or s///, how do I extract the results into a separate variable?

Thanks
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Actually what you need is to use a HTML library. It´s not possible to do a good regexp to deal with all HTML files and tags.
You should look at that previous answer...
http://www.experts-exchange.com/Programming/Programming_Languages/Perl/Q_20400861.html

Hope that helps.
regex to get the image address from the example above would be:

$htmlfile = "<a href=\"image address\">See larger photo</a>";

$htmlfile =~ m/\<a href\=\"(.*?)\"\>See larger photo\<\/a\>/isg;

print $1;

Author

Commented:
Cheers, I have got this working now (for normal cases anyway):-

For others interest, help the two methods look like:-

Tree Version

   # Build an HTML tree of the search results
   
   my $tree = HTML::TreeBuilder->new();
   $tree->parse($response->content);
   $tree->eof;
   
   # Search through the tree looking for the 1st link
   # that has the txt "See Larger Photo"
   
   my $photo_link= ($tree->look_down('_tag', 'a',
     sub {
           return unless $_[0]->attr('href');
           my @c= $_[0]->content_list;
           @c == 1 and $c[0] eq "See larger photo";
         }
     )
   );
   $output= $photo_link->attr('href');



Regex Version

   # Parse the search result for the cover location
   
   $htmlfile = $response->content;
   
   if ($htmlfile =~ m/\<a href\=\"(.*?)\"\>See larger photo\<\/a\>/i)
   {
     $photo_link = $1;
   }
   else
   {
     print "You Are Not At The Item Page\n";
   }
   # Return the album cover
   return $photo_link;
}


I am currently unsure which version to use, A more suitable option might appear once i have to handle cases that don't have a single search result...

Cheers

Author

Commented:
I want to give Wizard2000 the points aswell as that answer helped also... is this possible?
Drop a note in the community support forum explaining the point distribution you want with a pointer to this question.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial