• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 357
  • Last Modified:

Hoe to open file from html using perl / cgi?

How can I open a file from an HTML page by calling a perl script?
Example a web page with a button.  At the cilck of a button.. should trigger a perl script that opens the text file?

the text file looks like this

File - words.txt

hello
world
text
new
a
of
the
etc..
0
sdesar
Asked:
sdesar
  • 5
  • 2
  • 2
1 Solution
 
mehdiCommented:
-HTML

<form action="myscript/pl" method="post">

<input type="submit">

</form>

-PERL

#!/usr/local/bin/perl -w  
# point to where YOUR perl is installed


#point to where you file is.. this example is a uni* path
$path = '/home/myname/myfile/';

#open the file
open (FH, "$path");

# the all important header..
print "Content-type: text/html\n\n";

#loop through file, print each line
while ($line=<FH>) {
  print $line;
}

#close the file
close (FH);
exit(1);

~~~~~~~~~~~~~~~~~~~~~~~~~~~
The secret lemonade drinker
0
 
sdesarAuthor Commented:
I am getting a 500 Internal server error.

Here's the code-
#!/usr/local/bin/perl -w  
# point to where YOUR perl is installed


#point to where you file is.. this example is a uni* path
$path = 'stop_words.txt';

#open the file
open (FH, "$path");

# the all important header..
print "Content-type: text/html\n\n";

#loop through file, print each line
while ($line=<FH>) {
 print $line;
}

#close the file
close (FH);
exit(1);
#!/usr/local/bin/perl -w  
# point to where YOUR perl is installed


#point to where you file is.. this example is a uni* path
$path = '/home/textseem/www/cgi-bin/stop_words.txt';

#open the file
open (FH, "$path");

# the all important header..
print "Content-type: text/html\n\n";

#loop through file, print each line
while ($line=<FH>) {
 print $line;
}

#close the file
close (FH);
exit(1);


here's the website - http://208.56.56.72/test2/hello.html

Thanks for your help!
Awaiting a response soon.
0
 
elminsterCommented:
Check that the perl file readable and executable for all.
( chmod 755 <perl_file> )

It's the first thing I check when I see the '500 Internal' error.
0
Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

 
mehdiCommented:
that error could mean anything.  have a look at your error log (find out where this is from your ISP).. find the exact cause of the problem.

what platform is this on ? my example assumed Uni*

but first try adding the following line just *beneath* the line #!/usr/local/bin/perl -w  

the line is:

use CGI::Carp qw(fatalsToBrowser);

this should give more detail about the error.. like our friend says, check the text file permissions, and check that the script has execute permissions.

~~~~~~~~~~~~~~~~~~~~~~~~~~~
The secret lemonade drinker
0
 
sdesarAuthor Commented:
I changed permissions and also added the above suggestion!
I don;t get an error.

But I still see no data when the submit button is hit!

What could be the reason for this behavior?

Thanks!
0
 
sdesarAuthor Commented:
Awaiting a response.. please help!
0
 
elminsterCommented:
I've tried the above on my linux box and it works OK.

However, if the 'words' file is not readable, nothing is displayed - not even an error, so check that the permissions on the 'words' file are correct.

Also - I've noticed that in your first post you called the file 'words.txt', but in your script you call 'stop_words.txt'. Check that you're using the right filename (and path) in the script.

What OS are you running on?
0
 
sdesarAuthor Commented:
I am very sorry for the late response... I never got an email on the response and I forgot to check again.

Thanks a million! It works now!

******However, I cannot understand why its not working when I implement this on my site that actually opens the stop_words.txt and deletes all those words and then does its analysis of the bestwords etc... to test what I mean ..
please check out -
**** http://208.56.56.72/test2/index.html

**** Also is there a way to find out what versions of the modules are located on this server .. ie what version of Lingua:EN:Fathom.

*** I am using Windows NT to upload my site on to the servers at ehost4u.com... I am guessing they use UNIX... I don't know.

The code that does the chopping of the stop_words is in
PageParser.pm

Here's the code:-

package PageParser;

use strict;
use FindBin qw($Bin);

# Simple extension of Treebuilder to parse through a file, and add
# in the capability to skip through the text

use Lingua::EN::Fathom;

use HTML::TreeBuilder;
use HTML::Element;
use HTML::FormatText;
use URI::URL;

use Vector;
use Lingua::Stem;


my %defaults = (
      
      HTML => undef,      # The HTML as a text string
      KEYWORDS => undef,      # Keywords as a hash, count of keys in analyze
      URL => undef,      # The path to wherever
);

sub new {
    my $class = shift;

    my %extra;
    my $self = {};
    bless($self, $class);

    @$self{keys %defaults} = values %defaults ;
    if(@_) {
        my %stuff = @_;
        @$self{keys %stuff} = values %stuff;
    }
    $self->init();

    return $self;
}


sub init {
    my $self = shift;

    if($self->{HTML}) {
      $self->{TEXT} = $self->as_Text();
      $self->txtAnalyze();
      $self->{VECTOR} = Vector->new( $self->getStemParagraphs() );
    }
}

sub as_Text {
      my $self = shift;
      return($self->dumpText2());
}

# Use lynx to get it into a more reasonable ascii format
sub dumpText2() {
      my $self = shift;
      my $ret = "";
      my $url = $self->{URL};

      my @text = split /\r?\n/, `lynx -dump -width=4000 $url`;
      
      # Pass 1 -- clear out blank lines, [image] lines, and
      #  meaningless junk
      my @text1 = map {
            my $line = $_;
            # remove long meaningless --- and ___ strings
            $line =~ s/[!@#$%^&*()\-_+=][!@#$%^&*()\-_+=]+//;
            if( $line =~ m/^\s*\[.*\]\s*$/ ) {
                  $line = undef;
            }
            $line
      } @text;

      # Pass 2 -- Look for short lines that are probably header lines to
      #  paragraphs.  Combine those into the next paragraph.  Look for
      #  long lines that may have been broken up.  Should have a 7 space
      #  indent on them.  Join those lines together.

      my $line;
      while ($#text1) {
            $line = shift(@text1);
            if($line =~ m/^\s*$/) {
                  next;
            }
            if( $line =~ m/^\s*References\s*$/) {
                  $self->references(\@text1);
                  last;
            }
            
            # Abitrarily assume that a header line is less than 50 chars
            if(length($line) < 50 ) {
                  my $line2 = shift(@text1);
                  if($line2) {
                        if(length($line2) > 50)  {
                              $line .= "\n$line2";
                        }
                        else {
                              unshift @text1, $line2;
                        }
                  }
                  else {
                        unshift @text1, $line2;
                  }
            }
            while(length($line) > 60) {
                  my $line2 = shift(@text1);
                  if(!$line2) {
                        last;
                  }
                  if($line2 =~ m/^       \w+/) {
                        $line .= $line2;
                  }
                  else {
                        unshift @text1, $line2;
                        last;
                  }
            }

            # trim off useless space
            $line =~ s/^\s*(.*)\s*$/$1/;

            $ret .= "<P>\n$line\n</P>\n";
      }
      return $ret;
}
# This get the lynx reference set
# found at the end of the lynx output
sub references {
      my $self = shift;
      my $text = shift;

      my %ref = ();

      for( @$text) {
            if( m@(\d+).\s*([^\s#]+)\s*@ ) {
                  my $num = $1;
                  my $rf = $2;
                  $rf =~ s@/$@@;
                  if($rf !~ m/mailto/i) {
                        $ref{$num} = $rf;
                  }
            }
      }
      $self->{REFERENCES} = \%ref;

      # Now go and get the interesting references from the html
      my $html = $self->{HTML};
      my @str = ();
      my %hstr = ();

      $html =~ s{<A\s.*?HREF="?([^\s">]*)[^>]*?>(.*?)</a>}{ push @str, ($1, $2); "$&" }iges;
      while(scalar(@str)) {
            my $href = shift @str;
            my $val = shift @str;
            $href = url($href, url($self->{URL}))->abs->as_string;
            $href =~ s@/$@@;
            $val =~ s/<[^>]*>//gs;
            $val =~ s/^\s*(.*)\s*$/$1/;
            $hstr{$href} = $val if $val ;
      }
      $self->{HTMLREFERENCES} = \%hstr;
}

# Pass through the paragraph, reconstructing links that were in the original
# HTML except now they go through the Javascript stuff.
sub hideLinks {
      my $self = shift;
      my $par = shift;
      my $str = $self->{HTMLREFERENCES};
      my $rf = $self->{REFERENCES};

      return $par unless $rf;
      return $par unless $str;

      my @num = ( $par =~ m/\[(\d+)\]/gs );

      for (@num) {
            if($rf->{$_}) {
                  my $re = "\\[$_\\]\\s*".$str->{$rf->{$_}}.'\s*';
                  unless ($par =~ s{$re}{\[$_\]}) {
                        $par =~ s/\[$_\]//;
                  }
            }
      }
      return $par;
}

# Recover the links that lynx inconveniently put at the end of the file
# and put them in their rightful place in teh document.
sub restoreLinks {
      my $self = shift;
      my $par = shift;
      my $str = $self->{HTMLREFERENCES};
      my $rf = $self->{REFERENCES};

      return $par unless $rf;
      return $par unless $str;

      my @num = ( $par =~ m/\[(\d+)\]/gs );

      for (@num) {
            if($rf->{$_}) {
                  my $re = "\\[$_\\]\\s*";
                  $par =~ s{$re}["<A HREF=\"javascript:linkto('".$rf->{$_}."')\">".$str->{$rf->{$_}}."</A>"]gei;
            }
      }
      return $par;
}


############################################################
# The following methods are adaptation from url_fathom
# or my interpretation of what should happen

# txt_analyze --
#  1. create a text version of the html
#  2. strip it of "stop words", and send it to fathomAnalyze
#  3. create a list of the words on a total basis
#  4. take the top N of these words as Keywords
#  5. create a html document which analyzes each paragraph
#     highlights each keyword as a link to the previous keyword
#     for each paragraph that contains 1 or more keywords, print a list of
#     the keyword frequencies, and a skip anchor which sends it from
#     one paragraph to the next.

# Strip out "stop_words" Are these common prepositions, etc.

sub txtAnalyze {
      my $self = shift;
      my $text = $self->{TEXT};

      my $stopwords = $self->{STOPWORDS};
      unless( $stopwords ){
            $stopwords = $self->stop_words();
            $self->{STOPWORDS} = $stopwords;
      }

      # remove all the stopwords
      my $re =  "(\\b" . join ("\\b|\\b", @$stopwords) . "\\b)";
      my $txt = $text;
      $txt =~ s/$re//gis;
      $self->{REDUCEDTEXT} = $txt;
      
      # Fathom analyze to get keywords
      my $keywords = $self->fathomAnalyze($txt);

      # nip the rest down to their stems ...
      my $stemmer = Lingua::Stem->new( -locale => 'EN');
      $stemmer->stem_caching({-level => 1});
      my $stems = $stemmer->stem(@$keywords);
      my %stemhash = ();
      for (0..$#$keywords) {
            $stemhash{$keywords->[$_]} = $stems->[$_];
      }
      $self->{STEMS} = \%stemhash;

      # Now, do a re-count based on stemmed words
      my $fathom = $self->{FATHOM};
      my %uniq_words = $fathom->unique_words;
      my %keycount;
      for (keys %uniq_words) {
            $keycount{$stemhash{$_}} += $uniq_words{$_};
      }
      $self->{STEMCOUNT} = \%keycount;

      # Now, get the top 10 keywords

      ($self->{STEMKEYWORDS}, $self->{KEYWORDS}) = $self->getStemKeywords(10);
}

# dump out the parsed text, putting in some html tags to make the
# display more useful.
sub dumpParseText {
      my $self = shift;
      my $keywords = $self->{KEYWORDS};
      my $ret = "";
      my $vector = $self->{VECTOR};

      # Create the HTML document
      my $re =  "(\\b" . join("\\b|\\b", @$keywords ) . "\\b)";
      my %kws = ();
      map { $kws{$_} = 0; } @$keywords;

      $ret = $self->dumpTopKeywords();
      my $txt = $self->{TEXT};
      my $parno = 0;      # paragraph number
      my $lastparno = 0;
      my $curpar = 0;
      my @klist = ();
      # Scan by paragraph to set the links
      $ret .= "<TABLE cellpadding=0 cellspacing=0>\n";
      while( $txt =~ m@<P>\n(.*?)\n</P>@gis ) {
            my $par = $self->hideLinks($1);
            my %kw = ();
#            if( $par =~ s{$re}[ $kw{lc($&)} = 1; "<A HREF=#" . lc($&) . $kws{lc($&)}++ . " NAME=" . lc($&) . $kws{lc($&)} . " >$&</A>" ]gei ) {
            $ret .= "<TR class=Bg1><TD><A NAME=p$curpar HREF=\"javascript:pswitch($curpar)\">$curpar.</A>&nbsp;&nbsp;";
            if( $par =~ s{$re}[$kw{lc($&)} = 1; "<B>$&</B>"]gei ) {
                  # Wow, there are keywords in this paragraph!
                  push @klist, ("\"par$curpar\"");
                  $parno += 1;
                  $ret .= "<A NAME=par$curpar>KEYWORDS: " . join(" &nbsp;", keys %kw) . "</A>";
            }
            $ret .= "</TD></TR>\n";
            $par = $self->restoreLinks($par);

            $par =~ s/\n/<br>\n/gis;

            $ret .= "<TR class=Bg2><TD>" . $par . "</TD></TR>\n";
            $curpar += 1;
      }
      $ret .= "</TABLE>\n";

      if($self->{MXDISPLAY}) {
            $ret .= "<P>" . $vector->mxdisplay();
      }
      if($self->{VFDISPLAY}) {
            $ret .= "<P>" . $vector->vfdisplay();
      }

my $js = <<"EOT";
<STYLE>
BODY
{
   BACKGROUND-COLOR: white;
   FONT-FAMILY: Verdana, Arial, Helvetica;
   FONT-SIZE: 8pt;
   MARGIN: 4px
}
TD
{
   FONT-FAMILY: Verdana, Arial, Helvetica;
   FONT-SIZE: 8pt
}
.Bg1
{
   BACKGROUND-COLOR: #dddddd
}
.Bg2
{
   BACKGROUND-COLOR: #eeeecc;
   FONT-SIZE: 10pt;
}
.Bg3
{
   BACKGROUNG-COLOR: #dddddd;
   FONT-SIZE: 8pt;
}
</STYLE>
<script language=javascript>
EOT
$js .= "var klist = new Array(" . join( ",", (@klist)) . ");\n";
$js .= <<"EOT";
parent.menu.setParNo(klist.length);
function kvalue(n) {
      return klist[n];
}
EOT
$js .= $vector->mxscript();
$js .= <<"EOT";
function pswitch(par) {
   // alert("pswitch -- "+par);
   parent.menu.setParLink(par, plist[par].length);
}
function setLocation(par, n) {
      // alert("setLocation -- "+par+" "+n);
      // alert("Hash set " + plist[par][n]);
      parent.menu.setParValue(vlist[par][n], plist[par][n]);
      // document.hash.location = '#' + plist[par][n];
      // alert("Hash Location -- "+document.hash.location);
}
function linkto(url) {
      parent.nav.newurl(url);
}

</script>
EOT
return ($js , $ret);
}
      
# Read in the stopwords.
sub stop_words {
      my $self = shift;
open KW, 'stop_words.txt';
my @kw = map {chop;$_} <KW>;
close KW;

      return \@kw;
}

########## CREATE A BEST WORDS LIST

sub fathomAnalyze {
      my $self = shift;
      my $txt = shift;
      my $fathom = $self->{FATHOM};

      if(!$fathom) {
            $fathom = new Lingua::EN::Fathom;
            $self->{FATHOM} = $fathom;
            $txt =~ s@</?P>@@gis;
            $fathom->analyse_block($txt);
      }
      my %uniq_words = $fathom->unique_words();

      my @kws = keys %uniq_words;
      return \@kws;
}

# dump the top 10 keywords
sub dumpTopKeywords {
      my $self = shift;
      return $self->dumpKeywords('num', 10);
}

sub dumpKeywords {
      my $self = shift;
      my $dir = shift;   # sort by either alpha, or num
      $dir = "alpha" unless $dir;
      my $len = shift;
      $len = 0 unless $len;

      my %uniq_words = %{$self->{STEMCOUNT}};
      my $word;
      my $ret;

      my @list = sort keys %uniq_words;
      if($dir eq 'num') {
            @list = sort { $uniq_words{$b} <=> $uniq_words{$a} }  keys %uniq_words;
      }

      if($len) {
            splice @list, $len;
      }

      $ret = "<TABLE>\n";
      foreach $word ( @list )
      {
            $ret .= "<TR><TD ALIGN=right>" . $uniq_words{$word}. "</TD><TD>$word</TD></TR>\n"; # outputs the word and frequency.
      ##                  print OUT ("$word\n"); # prints just the words

      }
      $ret .= "</TABLE>\n";
      return $ret;
}

# Get the top n Stem Keywords.  Also generate the equivalent array
# of real keywords (which will have more than n keys, and display unstemmed)
sub getStemKeywords {
      my $self = shift;
      my $len = shift;
      my $stems = $self->{STEMS};
      my $stemcount = $self->{STEMCOUNT};
      my @list = sort { $stemcount->{$b} <=> $stemcount->{$a} }  keys %$stemcount;
      splice @list, $len;

      # now find all the words in the other list
      my @klist = ();
      for (keys %$stems) {
            my $w = $_;
            for (@list) {
                  if($stems->{$w} eq $_) {
                        push @klist, $w;
                        last;
                  }
            }
      }

      return( \@list, \@klist );
}

# replace words with stemmed words in the paragraphs.
sub getStemParagraphs {
      my $self = shift;
      my $text = $self->{REDUCEDTEXT};

      # Get list of multi-word stems
      my $stems = $self->{STEMS};
      my %seen = ();
      map {  $seen{$_} ++ } values %$stems;
      my @list = grep { $seen{$stems->{$_}} > 1} keys %$stems;

      # replace those words in the text
      my $re =  "(\\b" . join("\\b|\\b", (@list) ) . "\\b)";
      $text =~ s{$re}[ $stems->{lc($&)} ]gise;

      # break into paragraphs
      my @plist = ( $text =~ m@<P>\n(.*?)\n</P>\n@gis );

      # create a key list
      my %kseen = ();
      my @klist = grep { ! $kseen{$_} ++ } map { $seen{$stems->{$_}} > 1 ? $stems->{$_} : $_  } keys %$stems;

      return (\@plist, \@klist);
}


sub getKeywords {
      my $self = shift;
      my $fathom = $self->{FATHOM};
      my %uniq_words = $fathom->unique_words;
      # my @list = grep { $uniq_words{$_} > 1 } keys %uniq_words;
      my @list = keys %uniq_words;

      return \@list;
}

sub getParagraphs {
      my $self = shift;
      my $text = $self->{TEXT};
      my @list = ( $text =~ m@<P>\n(.*?)\n</P>\n@gis );

      return \@list;
}

1;




Thanks,
Awaiting a reponse soon....
0
 
sdesarAuthor Commented:
Thanks for all the help!

It all works now!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

  • 5
  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now