?
Solved

HTML:Parser cgi script

Posted on 2003-03-06
13
Medium Priority
?
213 Views
Last Modified: 2010-08-05
I need someone to show me how to write a cgi script to run on windows & unix called from browser, the code needs to loop through a list (list.dat) of urls ie:
http://www.awebsite.com
http://www.awebsite2.co.uk
http://www.awebsite3.net
http://www.awebsite4.org

call up the url's and write/print the following to a tab delimeted file, info.txt

1. The first 8 characters of the url ie: ' awebsite ' or everything between www. and .com (or whatever after . ie .net .co.uk .org) if the address has less than 8 characters. If the address is a repeat then add number
(incremented) ' awebsite_1 '.
2. The 'url'
3. The 'title'
4. The 'action' ie http://www.awebsite.com/cgi-bin/script.cgi
5. The method ie: POST
6. The 'input' name - values ie:
<input type=hidden name='user' value='JohnGreen'>
<input type=hidden name='state' value='NewYork'>
user=JohnGreen,state=NewYork
7. Print a line ------------- between each loop

Write all the above to the browser

write 1. to another file info2.txt ie:
<option value="awebsite">awebsite:General: FFS</option>
0
Comment
Question by:malkie
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 5
13 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 8084182
Where do the 'title', 'action', and 'input' name - values come from?
0
 
LVL 48

Expert Comment

by:Tintin
ID: 8085003
Your specification is reasonably clear apart from the stuff ozo mentioned.  

Once you've supplied that, the rest of it is pretty straight forward.
0
 

Author Comment

by:malkie
ID: 8088595
The 'title', 'action', and 'input' name - values comes from the html source code of the url which is called. So if no 'title' then the script would write 'no title' to the file and so on.
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 84

Expert Comment

by:ozo
ID: 8092333
use LWP::Simple;
require HTML::Parser;
{
     package MyParser;
     use base qw(HTML::Parser);
    sub start {
        my ($self,$tag, $attr, $attrseq, $origtext) = @_;
        $in{$tag}++;
        if( $tag eq "form" ){
            print "action: $attr->{action}\n";
            print "method: $attr->{method}\n";
        }
        if( $tag eq "input" ){
            print "$origtext\n";
        }
        if( $tag eq "oprion" ){
            $option=$origtext;
        }

    }
    sub end {
        my ($self,$tag, $attr, $attrseq, $origtext) = @_;
        $in{$tag}--;
        if( $tag eq "title" ){
            $title ||= 'no title';
            print "title: $title\n";
            $option = qq(<option value="$main::u">$main::u:$title</option>\n);
            $title='';
        }
    }
    sub text {
        my ($self,$text) = @_;
        if( $in{title} ){
                $title .= $text;
        }
    }
}

sub start {
  my ($tag, $attr, $attrseq) = @_;

}
$p = new MyParser;
open L,"<list.dat" or die "Can't open list.dat $!";
open I1,">info.txt" or die "Can't open info.txt $!";
open I2,">info2.txt" or die "Can't open info2.txt $!";
select I1;

while( <L> ){
     chomp;
     ($u) = m'//(?:www\.)?([^.]{0,8})';
     if( $n = $n{$u}++ ){ $u .= "_$n"; }
     next unless $content = get($_);
     print I1 $sep;
     $sep = "-------------\n";
     print I1 "$_\n";
     $p->parse($content);
     print I2 $MyParser::option;
}
0
 

Author Comment

by:malkie
ID: 8093315
I ran the script calling " ozo1.cgi " from the browser on windows and it came up with:-
CGI Error
The specified CGI application misbehaved by not returning a complete set of HTTP headers. The headers it did return are:

Can't open list.dat No such file or directory at C:\Inetpub\wwwroot\cgi-bin\extracthtml\ozo1.cgi line 44.

I added:-
# Absolute path to list.dat file:
my $infile = "/Inetpub/wwwroot/cgi-bin/extracthtml/list.dat";

#open L,"<list.dat" or die "Can't open list.dat $!";
open L,"<$infile" or die "Can't open list.dat $!";

I ran the script and it came up with:-
CGI Error
The specified CGI application misbehaved by not returning a complete set of HTTP headers. The headers it did return are:
0
 
LVL 84

Expert Comment

by:ozo
ID: 8093397
use CGI qw/:standard/;
use CGI::Carp 'fatalsToBrowser';

my $infile = "/Inetpub/wwwroot/cgi-bin/extracthtml/list.dat";

open L,"<$infile" or die "Can't open $infile $!";
print  header,
       start_html('HTML:Parser cgi script');
print "<pre>";

while( <L> ){
    chomp;
    ($u) = m'//(?:www\.)?([^.]{0,8})';
    if( $n = $n{$u}++ ){ $u .= "_$n"; }
    next unless $content = get($_);
    print "$sep$_\n";
    print I1 "$sep$_\n";  
    $sep = "-------------\n";
    $p->parse($content);
    select I1;
    $p->parse($content);
    select STDOUT;
    print I2 $MyParser::option;
}
print "</pre>";
print end_html;
0
 

Author Comment

by:malkie
ID: 8093440
As code below, it prints to browser.
It is not writing to files, I tried to put this code in various places???

#### WHERE TO PUT THIS CODE IT IS NOT WRITING TO FILES
open I1,">info.txt" or die "Can't open info.txt $!";
open I2,">info2.txt" or die "Can't open info2.txt $!";
select I1;

--------
 
use LWP::Simple;
require HTML::Parser;
use CGI qw/:standard/;
use CGI::Carp 'fatalsToBrowser';

# Absolute path to list.dat file:
my $infile = "/Inetpub/wwwroot/cgi-bin/extracthtml/list.dat";
{
    package MyParser;
    use base qw(HTML::Parser);
   sub start {
       my ($self,$tag, $attr, $attrseq, $origtext) = @_;
       $in{$tag}++;
       if( $tag eq "form" ){
           print "action: $attr->{action}\n";
           print "method: $attr->{method}\n";
       }
       if( $tag eq "input" ){
           print "$origtext\n";
       }
       if( $tag eq "option" ){
           $option=$origtext;
       }

   }
   sub end {
       my ($self,$tag, $attr, $attrseq, $origtext) = @_;
       $in{$tag}--;
       if( $tag eq "title" ){
           $title ||= 'no title';
           print "title: $title\n";
           $option = qq(<option value="$main::u">$main::u:$title</option>\n);
           $title='';
       }
   }
   sub text {
       my ($self,$text) = @_;
       if( $in{title} ){
               $title .= $text;
       }
   }
}

sub start {
 my ($tag, $attr, $attrseq) = @_;

}
$p = new MyParser;
#open L,"<list.dat" or die "Can't open list.dat $!";

open L,"<$infile" or die "Can't open list.dat $!";
print  header,
      start_html('HTML:Parser cgi script');
print "<pre>";
while( <L> ){
    chomp;
    ($u) = m'//(?:www\.)?([^.]{0,8})';
    if( $n = $n{$u}++ ){ $u .= "_$n"; }
    next unless $content = get($_);
    print I1 $sep;
    $sep = "-------------\n";
    print I1 "$_\n";
    $p->parse($content);
    print I2 $MyParser::option;
}
print "</pre>";
print end_html;

#### WHERE TO PUT THIS CODE IT IS NOT WRITING TO FILES
open I1,">info.txt" or die "Can't open info.txt $!";
open I2,">info2.txt" or die "Can't open info2.txt $!";
select I1;
0
 

Author Comment

by:malkie
ID: 8093687
I have tried to move this part of the code from where it is above:-
#### WHERE TO PUT THIS CODE IT IS NOT WRITING TO FILES
open I1,">info.txt" or die "Can't open info.txt $!";
open I2,">info2.txt" or die "Can't open info2.txt $!";
select I1;

but get the following error:-

CGI Error
The specified CGI application misbehaved by not returning a complete set of HTTP headers. The headers it did return are:
Prototype mismatch: sub main::head ($) vs none at C:/Perl/lib/CGI.pm line 231.
0
 
LVL 84

Expert Comment

by:ozo
ID: 8095133
use CGI;
use CGI::Carp 'fatalsToBrowser';
use LWP::Simple;
require HTML::Parser;
{
    package MyParser;
    use base qw(HTML::Parser);
   sub start {
       my ($self,$tag, $attr, $attrseq, $origtext) = @_;
       $in{$tag}++;
       if( $tag eq "form" ){
           print "action: $attr->{action}\n";
           print "method: $attr->{method}\n";
       }
       if( $tag eq "input" ){
           print "$origtext\n";
       }
       if( $tag eq "oprion" ){
           $option=$origtext;
       }

   }
   sub end {
       my ($self,$tag, $attr, $attrseq, $origtext) = @_;
       $in{$tag}--;
       if( $tag eq "title" ){
           $title ||= 'no title';
           print "title: $title\n";
           $option = qq(<option value="$main::u">$main::u:$title</option>\n);
           $title='';
       }
   }
   sub text {
       my ($self,$text) = @_;
       if( $in{title} ){
               $title .= $text;
       }
   }
}


$p = new MyParser;

my $infile = "/Inetpub/wwwroot/cgi-bin/extracthtml/list.dat";
my $infofile = "/Inetpub/wwwroot/cgi-bin/extracthtml/info.txt";
my $infofile = "/Inetpub/wwwroot/cgi-bin/extracthtml/info2.txt";
open L,"<$infile" or die "Can't open $infile $!";
open I1,">$infofile" or die "Can't open $infofile $!";
open I2,">$info2file" or die "Can't open $info2file $!";
$q = new CGI;
print  $q->header,
       $q->start_html('HTML:Parser cgi script');
print "<pre>";

while( <L> ){
   chomp;
   ($u) = m'//(?:www\.)?([^.]{0,8})';
   if( $n = $n{$u}++ ){ $u .= "_$n"; }
   next unless $content = get($_);
   print "$sep$_\n";
   print I1 "$sep$_\n";  
   $sep = "-------------\n";
   $p->parse($content);
   select I1;
   $p->parse($content);
   select STDOUT;
   print I2 $MyParser::option;
}
print "</pre>";
print $q->end_html;
0
 

Author Comment

by:malkie
ID: 8106408
Sorry for my delay.

The snip from print below and what was required:-

info.txt is:-
http://yes/valuepairs1.html
title: test
action: http://yes/cgi-bin/ascript.pl
method: POST
<input type=hidden name='userdir' value='lnks'>
<input type=hidden name= lnkuser value= okay>
<input type=text name= emailid size=30>
<input type=text name="title" size=40>
<input type=text name="url" size=55>
<input type=submit value="Add">
<input type=reset>

required:-
yes/valu_1
http://yes/valuepairs1.html
test
http://yes/cgi-bin/ascript.pl
POST
<input type=hidden name='userdir' value='lnks'>
<input type=hidden name= lnkuser value= okay>
<input type=text name= emailid size=30>
<input type=text name="title" size=40>
<input type=text name="url" size=55>
<input type=submit value="Add">
<input type=reset>

info2.txt is:-
<option value="yes/valu_1">yes/valu_1:test</option>

required:-
<option value="yes/valu_1">yes/valu_1:General: FFS</option>
0
 

Author Comment

by:malkie
ID: 8113252
is it possible to make the two changes to the script???
0
 
LVL 84

Accepted Solution

by:
ozo earned 2000 total points
ID: 8117379
use CGI;
use CGI::Carp 'fatalsToBrowser';
use LWP::Simple;
require HTML::Parser;
{
   package MyParser;
   use base qw(HTML::Parser);
  sub start {
      my ($self,$tag, $attr, $attrseq, $origtext) = @_;
      $in{$tag}++;
      if( $tag eq "form" ){
          print "$attr->{action}\n";
          print "$attr->{method}\n";
      }
      if( $tag eq "input" ){
          print "$origtext\n";
      }
      if( $tag eq "oprion" ){
          $option=$origtext;
      }

  }
  sub end {
      my ($self,$tag, $attr, $attrseq, $origtext) = @_;
      $in{$tag}--;
      if( $tag eq "title" ){
          $title ||= 'no title';
          print "title: $title\n";
          $option = qq(<option value="$main::u">$main::u:General: FFS</option>\n);
          $title='';
      }
  }
  sub text {
      my ($self,$text) = @_;
      if( $in{title} ){
          $title .= $text;
      }
  }
}


$p = new MyParser;

my $infile = "/Inetpub/wwwroot/cgi-bin/extracthtml/list.dat";
my $infofile = "/Inetpub/wwwroot/cgi-bin/extracthtml/info.txt";
my $info2file = "/Inetpub/wwwroot/cgi-bin/extracthtml/info2.txt";

open L,"<$infile" or die "Can't open $infile $!";
open I1,">$infofile" or die "Can't open $infofile $!";
open I2,">$info2file" or die "Can't open $info2file $!";
$q = new CGI;
print  $q->header,
      $q->start_html('HTML:Parser cgi script');
print "<pre>";

while( <L> ){
  chomp;
  ($u) = m'//(?:www\.)?([^.]{0,8})';
  if( $n = $n{$u}++ ){ $u .= "_$n"; }
  next unless $content = get($_);
  print "$sep$u\n$_\n";
  print I1 "$sep$u\n$_\n";  
  $sep = "-------------\n";
  $p->parse($content);
  select I1;
  $p->parse($content);
  select STDOUT;
  print I2 $MyParser::option;
}
print "</pre>";
print $q->end_html;
0
 

Author Comment

by:malkie
ID: 8122309
Thank you it works great. God Bless
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans
Suggested Courses

741 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question