Solved

Trying to parse html file

Posted on 2003-10-27
4
202 Views
Last Modified: 2010-03-05
I'm trying to run a book example and I get the following error.

E:\>perl parser.pl
Can't locate HTML/Tagset.pm in @INC (@INC contains: E:/ind/perl/lib E:/ind/perl/
site/lib .) at E:/ind/perl/site/lib/HTML/LinkExtor.pm line 31.
BEGIN failed--compilation aborted at E:/ind/perl/site/lib/HTML/LinkExtor.pm line
 31.
Compilation failed in require at parser.pl line 5.
BEGIN failed--compilation aborted at parser.pl line 5.

sourcecode

#!e:/ind/perl/bin/perl -w

use strict;
use LWP::UserAgent;
use HTML::LinkExtor;
use URI::URL;

my $url = URI::URL->new('http://www.perl.com/');
my $base_url;

# Create new UserAgent object (browser)
my $ua = LWP::UserAgent->new();

# Give our agent a name
$ua->agent("Mozilla/4.7");

# Create HTTP GET request
my $request = HTTP::Request->new(GET => $url);

# Execute HTTP request
my $response = $ua->request($request);

# Check success
if ($response->is_success && $response->content_type eq 'text/html') {
    # Request was successful and is html
    $base_url = $response->base();
    print "Base URL: $base_url\n";
    my $link_extor = HTML::LinkExtor->new(\&extract_links);
    $link_extor->parse($response->content);
} else {
    # Request failed - print response code and message
    print "Error getting document: ", $response->status_line, "\n";
}

sub extract_links {
    my ($tag, %attr) = @_;

    if ($tag eq 'a' or $tag eq 'img') {
        foreach my $key (keys %attr) {
            if ($key eq 'href' or $key eq 'src') {
                my $link_url = URI->new($attr{$key});
                my $full_url = $link_url->abs($base_url);
                print "LINK: $full_url\n";
            }
        }
    }
}
0
Comment
Question by:mistadontplay
4 Comments
 
LVL 5

Accepted Solution

by:
fantasy1001 earned 63 total points
ID: 9631885
Not sure of the problem. Please check whether module tagset.pm is in the directory E:/ind/perl/site/lib/HTML/. If not, please download from
http://search.cpan.org/~sburke/HTML-Tagset-3.03/Tagset.pm

After copy, if the problem still arise, add a line use HTML::Tagset; to the top of your source code

Thanks & Cheers
0
 
LVL 8

Assisted Solution

by:davorg
davorg earned 62 total points
ID: 9633418
HTML::LinkExtor uses HTML::Tagset. It seems that you've installed HTML::LinkExtor, but not HTML::Tagset.
0
 
LVL 20

Expert Comment

by:jmcg
ID: 10038301
Nothing has happened on this question in over 2 months. It's time for cleanup!

My recommendation, which I will post in the Cleanup topic area, is to
split points between fantasy1001 and davorg.

PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER!

jmcg
EE Cleanup Volunteer
0

Featured Post

Free Tool: Postgres Monitoring System

A PHP and Perl based system to collect and display usage statistics from PostgreSQL databases.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

856 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question