Solved

Adding modules to @INC

Posted on 2004-08-04
11
1,161 Views
Last Modified: 2008-01-09
I'm trying to run a script that uses TableExtract.pm, which in turn uses Parser.pm and Entities.pm. I've placed these files in the two lib areas, c:\perl\lib and c:\perl\site\lib.

I did perl -e "print join(qq[\n], @INC)" and it listed the two above paths. I pasted the above pm's through notepad, do they need to be compiled? here's the error:

C:\>perl ./html2csv.pl shitlink.html
Can't locate HTML/TableExtract.pm in @INC (@INC contains: /home/ron/modules/lib/
site_perl /home/ron/modules c:/Perl/lib c:/Perl/site/lib .) at ./html2csv.pl lin
e 17.

For some reason, it lists /home/ron/modules/lib/site_perl, which I can't even find on my system. Why doesn't it show this path when i do perl -e "print join(qq[\n], @INC)" ?
0
Comment
Question by:dprasad
  • 6
  • 4
11 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 11723466
Did you put TableExtract.pm in c:/Perl/lib/HTML or /home/ron/modules/HTML?
0
 

Author Comment

by:dprasad
ID: 11723655
I don't have a c:\perl\lib\html, or the /home/ron/modules/HTML. i looked everywhere, did searches on the directory name. I don't see it anywhere
0
 

Author Comment

by:dprasad
ID: 11723683
could it be hidden somehow?
0
 
LVL 84

Expert Comment

by:ozo
ID: 11723721
Try creating the directory c:/Perl/lib/HTML or /home/ron/modules/HTML and put TableExtract.pm in it.
0
 
LVL 84

Expert Comment

by:ozo
ID: 11723743
How did you install the HTML::TableExtract  module?
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 

Author Comment

by:dprasad
ID: 11723753
OK I'll try that. Well, I just copied the TableExtract.pm from the CPAN site into a text file and saved it with a .pm extension. Am I missing something here?
0
 

Author Comment

by:dprasad
ID: 11723846
Ok, I sorry it's late, here's my code..there's a library call that I overlooked. I was given this script by someone, I will ask them what those libs are from. I tried what you said though, and the program runs, printing the information the screen. All I'm trrying to do is convert html  tables into a csv tables for import into a database. Its designed to accept a url. I gave it a html file.

 I will ask the person who gave this to me more about it





#!/usr/bin/perl

# Copyright (C) 2003 Ron Coscorrosa
# Released under the terms of the MIT License

# Converts HTML tables to CSV format for manuipulation within
# a spreadsheet program.

use strict;
use warnings;

use lib('/home/ron/modules');
use lib('/home/ron/modules/lib/site_perl');

use LWP::Simple qw/get/;
use CGI::Pretty qw/:standard/;
use HTML::TableExtract;


if (param('url'))
{
  print header('text/plain');
  write_csv(param('url'));
}
else
{
   print header;
   print start_html('-title'=>'URL to CSV Converter');
   print start_form('-method'=>'get');
   print "URL: ", textfield('-name'=>'url', '-size'=>65), p;
   print submit('-value'=>'Convert URL to CSV');
   print end_form, p;
   print "Converts the tables in a URL to CSV format for easier manipulation in a spreadsheet program\n", p;
   print "Copyright &copy; 2003 <a href=\"mailto:web\@coscorrosa.com\">Ron Coscorrosa</a>\n";
   print end_html;
}


sub write_csv
{
  my $html = get($_[0]);

  if (! $html)
  {
     print "Unable to retrieve HTML from URL: \"$_[0]\".  Please make sure you typed the URL correctly.";
     return;
  }

  my $te = new HTML::TableExtract();
  $te->parse($html);

  for my $ts ($te->table_states)
  {
     for my $row ($ts->rows)
     {
        print join(',', map { s/^\s+//;  s/\s+$//; s/\n+//; "\"$_\""}  @$row), "\n";
     }
  }
}
0
 
LVL 84

Accepted Solution

by:
ozo earned 250 total points
ID: 11723876
use lib('/home/ron/modules'); #is what's putting /home/ron/modules in @INC
if that's not where you put HTML/TableExtract.pm, then you can do a use lib to the place where you did put it
0
 
LVL 8

Assisted Solution

by:davorg
davorg earned 250 total points
ID: 11724489
In general, Perl modules shouldn't just be copied into directories that may or may not be in @INC. You should install them properly using the procedure described in the README file. On Windows you can often circumvent that by using the "ppm" program which comes with ActivePerl.

But if you really need to install it by copying, then for a module called HTML::TableExtract you need to create a directory called HTML within one of the directories in @INC (that should probably be c:\perl\site\lib) and put TableExtract.pm into that directory.

Hope that helps,

Dave...
0
 

Author Comment

by:dprasad
ID: 11729960
ok guys, I just realized this script isnt going to work for what I'm trying to do. I'm retrieving data from tables, but I need to split up some of the data that's in once cell and put them into separate columns, for example

(html col 1)                    (html col 2)
vintage: 1966                  price: $120.00
style: red
region: finger lakes

needs to look like (in the csv file) for each record:

col1        col2      col3             col4
vintage    style     region          price

1966        red       fingerlakes   120

So it seems more prudent to use the HTML::TreeBuilder and then parse the data from that, so I can delimit according to the html tag and not by table cells.

Here's the actual html

<table width="100%" border="0" cellspacing="0" cellpadding="0">

        <tr>
          <td>
                  <span class="bodybold">Name: </span><span class="body">
            <A HREF="http://www.awinestore.com/cgi-local/quikstore.cgi?store=ny&search=yes&detail=yes&product=08500000818&category=&keywords=&hits_seen=&page=search.html&and=&affiliate_id=">Anapamu Pinot Noir</a></span>
                 <span class="bodyunder"></span><br>
                  <span class="bodybold">Vintage: </span><span class="body">2001</span><br>
                  <span class="bodybold">Style: </span><span class="body">Pinot Noir</span><br>

                  <span class="bodybold">Vineyard: </span><span class="body"></span><br>
                  <span class="bodybold">Region: </span><span class="body">Monterey County</span></td>
        </tr>
        <tr>
          <td class="body"><span class="bodybold">Description:</span><span class="body">Round, with modestly tannic earth, ripe plum, currant and sweat character. Drink now through 2005. 21,432 cases made. - Wine Spectator 7/31/03 &nbsp;</span><br>
          <span class="bodybold">Size:</span> 750ml&nbsp;&nbsp;<span class="bodybold">Qty Available:</span>  10<br><span class="bodybold">Store:</span>&nbsp;A Wine Store New York
        </td>

        </tr>
      </table>

Can the treebuilder handle span tags? If not, I can do a replace all and change it to something else. This script seems to do what I need, except it parses bases on <a href.. tags and it prints to the screen. I need it to print the changes to a csv file.

use HTML::TreeBuilder;
my $tree = HTML::TreeBuilder->new;
$tree->parse_file('example.html') || die $!;

foreach my $a( $tree->find_by_tag_name('a')) {

my $href = $a->attr('href') || next;

my $text_content = $a->as_text;


print "$text_content\n";

}

$tree->delete;

Thanks!
0
 

Author Comment

by:dprasad
ID: 11729974
sorry my perl knowledge is extremely limited!
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
perl script 4 97
Need help with bash and/or perl commands on OS X Terminal 9 100
ppm conversion to curl on a module install 8 80
Perl Frameworks 1 79
There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
In this video I am going to show you how to back up and restore Office 365 mailboxes using CodeTwo Backup for Office 365. Learn more about the tool used in this video here: http://www.codetwo.com/backup-for-office-365/ (http://www.codetwo.com/ba…

911 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now