perl: REGEX on img alt tag

Hi

Given the code and html snippet bellow I'm trying to extract the name from the img alt tag

I can isolate the alt tag but can't get just the  name!

I want $name = "trevor OBT tumblr_ozxualLdb1who6_540.jpg"

As this is html I've no idea if each line of the alt ends in "\n" and my split isn't working

use strict; use warnings;
use HTML::TreeBuilder;
use HTML::Element;


my $body =HTML::TreeBuilder->new_from_file(*DATA);
  my @A = $body -> look_down('_tag', 'a');
  for my $a (@A){
    my $url = $a->attr('href'); 
    if((defined($url)) && ($url=~m/attachment/)  ){
        print  $url ."\n";
        my $img = $a -> look_down('_tag', 'img');
        my $alt = $img->attr('alt'); 
        print "alt [" . $alt . "]\n";  ##  works to here
        my @altBits = split(/nbsp/,$alt);
        foreach my $line (@altBits){
            if ($line =~ m/Name:\s.*(.*)\&/i){
                my $name =$1;
                print "name [$name]\n";                
                }
            }

         
        }
     else   {
                    print $url ."\n";
        }
    }# end for $A
print "Finished \n";

__DATA__
<div class="postbody">
			<div class="postrow">
				<div class="content">
					<div id="post_message_180">
						<blockquote class="postcontent restore ">
							Trevor <br>
<a href="http://www.example.com/vboard/attachment.php?s=b31c60a8e6f7c723&amp;attachmentid=104&amp;d=623527" 
id="attachment1040762" rel="Lightbox_1804154">
<img src="http://www.example.com/vboard/attachment.php?s=b31c60a8e6f7c723&amp;attachmentid=1042&amp;d=623527&amp;thumb=1" 
alt="Click image for larger version.&nbsp;

Name:	trevor OBT tumblr_ozxualLdb1who6_540.jpg&nbsp;
Views:	287&nbsp;
Size:	109.2 KB&nbsp;
ID:	1040762" class="thumbnail" style="float:CONFIG" title="Click image for larger version.&nbsp;

Name:	trevor OBT tumblr_ozxualLdb1who6_540.jpg&nbsp;
Views:	287&nbsp;
Size:	109.2 KB&nbsp;
ID:	1040762" border="0"></a>
						</blockquote>
					</div>

					
				</div>
			</div>
		

Open in new window

LVL 1
trevor1940Asked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Shaun VermaakTechnical SpecialistCommented:
Updated
What about this?
(?<=Name:\t).*(?=&nbsp;)

Open in new window

https://regex101.com/r/KVSFE5/2
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
trevor1940Author Commented:
Hi,

Your regex works on plain text but I Think the lines below create objects so you can't regex on a object

        my $img = $a -> look_down('_tag', 'img');
        my $alt = $img->attr('alt'); 

Open in new window


Doing this

        my $img = $a -> look_down('_tag', 'img');
        my $alt = $img->attr('alt')->as_text;

Open in new window


Gives this error

http://www.example.com/vboard/attachment.php?s=b31c60a8e6f7c723&attachmentid=104&d=623527
Can't locate object method "as_text" via package "Click image for larger version.á

Name:   trevor OBT tumblr_ozxualLdb1who6_540.jpgá
Views:  287á
Size:   109.2 KBá
ID:     1040762" (perhaps you forgot to load "Click image for larger version.á

Name:   trevor OBT tumblr_ozxualLdb1who6_540.jpgá
Views:  287á
Size:   109.2 KBá
ID:     1040762"?) at D:\PerlScripts\GetAlt.pl line 15.

Open in new window

0
trevor1940Author Commented:
I figured it out by doing a data dump on $alt to reveal

"Click image for larger version.\xA0\n\nName:\ttrevor OBT tumblr_ozxualLdb1who6_540.jpg\xA0\nViews:\t287\xA0\nSize:\t109.2 KB\xA0\nID:\t1040762"

Open in new window


so my REGEX became

            if ($alt =~ m/(?<=Name:\t)(.*)(?=\xA0)/i){
                my $name =$1;
                print "name [$name]\n";                
                }

Open in new window

0
trevor1940Author Commented:
Hi,
Please see last comment for solution
I forgot &nbsp; became the hexadecimal \xA0 when using HTML Elements

Thanx for your help
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
PERL Regular Expressions (regex)

From novice to tech pro — start learning today.