How to write CGI script for a fetch routine that's in perl?

I have written the following routine to fetch any url and convert it to ascii.

However, I am getting errors when I write a CGI script that calls this rountine.

I want the CGI script to do the following

Enter URL :  http://www.aol.com SUBMIT

In the text box I have to call the fetch routine but its not working on the browser.

I ran this perl routine from the command line and it works fine.

Any suggestions  please............



#!/usr/bin/perl

# File: sub_fetch_url (from Perl Cookbook) -- modified for use of SUBROUTINES

# Usage: provide URL at command line argument; will (1) fetch contents
#        of document, (b) copy it to file url.copy, (c) display contents
#        to standard output, and (d) remove all <...> ("html"-tags), and
#        display resulting text in ASCII

# subroutines are:
#                  fetch -- gets text from url
#                  cp_to_file -- copies text to file
#                  plain_text -- converts html into plain text

use LWP::Simple;
use HTML::Parse;
use HTML::FormatText;

$URL = shift; # get command line arg(s)

$content = fetch($URL);
print $content;

cp_to_file($content, "url.copy");

$plain_text = plain_text($content);
print "\n\n\n";
print $plain_text;


# ---- subroutines ------------

# fetches URL and returns document text as is (html)

sub fetch {
    my ($url) = @_;
    my $cont;

    $cont = get($url);
    return $cont;
}

# copies text to file

sub cp_to_file {
    my ($text, $to_file) = @_;

    open(OUT, ">" . $to_file);
    print OUT $text;
    close(OUT);
}

# converts html text into plain text; (simplistic approach)

sub plain_text {
    my ($in_text) = @_;
    my $plain;

    ($plain = $in_text) =~ s/<[^>]*>//gs;

    return $plain;
}
sdesarAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

sdesarAuthor Commented:
Edited text of question.
0
renfieldCommented:
You need to capture the user input from a browser doc:

use CGI;

my $cgi = new CGI;
my $url = $cgi->param('url');

if ( $url ) {
   # we have user input
   # continue with yer program
}
else {
   # output the form for the user
   print $cgi->header;
   print $cgi->html_start;
   print $cgi->form_start;
   print $cgi->textbox('url');
   print $cgi->form_end;
   print $cgi->html_end;
}


Check the CGI module for exact syntax, but the basic idea is:
Give the user the html form with a text box to input the url first,
then process the input when the form is submitted.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
sludinCommented:
A few things:  

- you may have speller HTML::Parser wrong.  My version at least has the 'r' on the end.

- You need to include a Content-Type text/plain\n\n line.

- You cannot simply shift the 'arguments'.  You need to read from the QUERY_STRING env variable in the case of a GET or read from STDIN in the case of a POST.  I suggest using GET as it is much easier ( in my opinion ).

- I could not find the HTML::FormatText module and since I noticed you were not referencing it I commented it out.

- In stead of the way I did it below, you could also use the CGI.pm modlue to aid in reading parameters from a POST or GET

- below the code is a sampel HTML page

Good luck!

-stephen

Here is the main body of my version:

use LWP::Simple;
use HTML::Parser;
#use HTML::FormatText;
use URI::Escape;

# extract the url parameter fom the query string
($URL) = $ENV{QUERY_STRING} =~ /url=([^ &]*)/;
# un 'urlencode' the param
$URL = uri_unescape( $URL );

$content = fetch($URL);

# print the require header
print "Content-Type: text/plain\n\n";

# the rest is the same
....


The HTML:

<html>
<body>
<form action="tmp.pl" method="get">
<input type="text" name="url"><br>
<input type="submit">
</form>
</body>
</html>


0
Cloud Class® Course: Amazon Web Services - Basic

Are you thinking about creating an Amazon Web Services account for your business? Not sure where to start? In this course you’ll get an overview of the history of AWS and take a tour of their user interface.

sdesarAuthor Commented:
Thanks  for your efforts.

However, when I enter the  URL  in the text box on the url_fetch.html page .... The url_fetch.cgi routine should fetch the URL in its ascii format and store the contents in a text file.
However,  no data appears on the browser or in the text file.

This is what I did--
File 1 - url_fetch.cgi

#!/usr/bin/perl

use LWP::Simple;
                use HTML::Parser;
                #use HTML::FormatText;
                use URI::Escape;

                # extract the url parameter fom the query string
                ($URL) = $ENV{QUERY_STRING} =~ /url=([^ &]*)/;
                # un 'urlencode' the param
                $URL = uri_unescape( $URL );

                $content = fetch($URL);

                # print the require header
                print "Content-Type: text/plain\n\n";


use CGI;

                my $cgi = new CGI;
                my $url = $cgi->param('url');

                if ( $url ) {
                   # we have user input
                   # continue with yer program
                }
                else {
                   # output the form for the user
                   print $cgi->header;
                   print $cgi->html_start;
                   print $cgi->form_start;
                   print $cgi->textbox('url');
                   print $cgi->form_end;
                   print $cgi->html_end;
                }
#### subroutines
sub fetch {
    my ($url) = @_;
    my $cont;

    $cont = get($url);
    return $cont;
}

# copies text to file

sub cp_to_file {
    my ($text, $to_file) = @_;

    open(OUT, ">" . $to_file);
    print OUT $text;
    close(OUT);
}

# converts html text into plain text; (simplistic approach)

sub plain_text {
    my ($in_text) = @_;
    my $plain;

    ($plain = $in_text) =~ s/<[^>]*>//gs;

    return $plain;
}

File 2- url_fetch.html

<html>
                <body>
                <form action="url_fetch.cgi" method="get">
                <input type="text" name="url"><br>
                <input type="submit">
                </form>
                </body>
                </html>

0
sdesarAuthor Commented:
I implemented your suggestions--
as I have stated in the comment abouve.
However, when I run the script I am getting the entire url_fetch.cgi code, instead of the URL that I entered in
url_fetch.html.

Any suggestions please....
0
sdesarAuthor Commented:
PS... I gave U excellent points are they recorded.
0
sdesarAuthor Commented:
could someone please help me with this script..please
0
sludinCommented:
Did you make the script executable? ( 755 permissions or some combination like that? )
0
sdesarAuthor Commented:
Yes,  I did chmod 755 url_fetch.html url_fetch.cgi.

Also the error states -- document contains no data...
it seems like its not recognizing the fetch subroutine.

Thanks for helping.
0
sludinCommented:
Could you send me the exact script you are using now again?  The last update has two distinct methods going on - one using CGI.pm one without.  

Your supposition is probably right.  Send the script and I will test it here.

-steve
0
sdesarAuthor Commented:
Yes... This is it...
File 1 - url_fetch.cgi
File  2 - url_fetch.html

File 1 - url_fetch.cgi

                  #!/usr/bin/perl

                  use LWP::Simple;
                                  use HTML::Parser;
                                  #use HTML::FormatText;
                                  use URI::Escape;

                                  # extract the url parameter fom the query string
                                  ($URL) = $ENV{QUERY_STRING} =~ /url=([^ &]*)/;
                                  # un 'urlencode' the param
                                  $URL = uri_unescape( $URL );

                                  $content = fetch($URL);

                                  # print the require header
                                  print "Content-Type: text/plain\n\n";


                  use CGI;

                                  my $cgi = new CGI;
                                  my $url = $cgi->param('url');

                                  if ( $url ) {
                                     # we have user input
                                     # continue with yer program
                                  }
                                  else {
                                     # output the form for the user
                                     print $cgi->header;
                                     print $cgi->html_start;
                                     print $cgi->form_start;
                                     print $cgi->textbox('url');
                                     print $cgi->form_end;
                                     print $cgi->html_end;
                                  }
                  #### subroutines
                  sub fetch {
                      my ($url) = @_;
                      my $cont;

                      $cont = get($url);
                      return $cont;
                  }

                  # copies text to file

                  sub cp_to_file {
                      my ($text, $to_file) = @_;

                      open(OUT, ">" . $to_file);
                      print OUT $text;
                      close(OUT);
                  }

                  # converts html text into plain text; (simplistic approach)

                  sub plain_text {
                      my ($in_text) = @_;
                      my $plain;

                      ($plain = $in_text) =~ s/<[^>]*>//gs;

                      return $plain;
                  }

File 2- url_fetch.html

                  <html>
                                  <body>
                                  <form action="url_fetch.cgi" method="get">
                                  <input type="text" name="url"><br>
                                  <input type="submit">
                                  </form>
                                  </body>
                                  </html>


0
sludinCommented:
OK.  Here is a script that works for me.  I am guessing that this is what you want:

1 - Fetch a URL
2 - Remove HTML tags
3 - Save the processed doc
4 - Redirect the user to that doc

If that isn't you precise desire you should be able to modify the below script to fit your needs.  

One of the problems you were having was the spelling of the CGI.pm methods.  

If you have any questions about the script go ahead and ask:

-stephen

---

#!/usr/bin/perl

use LWP::Simple;
use HTML::Parser;
use CGI;

my $cgi = new CGI;
my $url = $cgi->param('url');

my $HTTP_ROOT = "/home/httpd/";

if ( $url ne "" )
{
      $content = fetch( $url );
      if ( $content ne "" )
      {
            #print $cgi->header( -type => 'text/plain' );
      #   print $content;
            my $plain_text = plain_text( $content );

            #print $plain_text, "\n";

            cp_to_file( $plain_text, "$HTTP_ROOT\\cgi-bin\\tmp\\file.txt" );
            print $cgi->redirect( "/cgi-bin/tmp/file.txt" );

      }
      else
      {
            output_form( "Could not load URL: $url<br>" );
      }
}
else
{
      output_form( "Enter URL to fetch" );
}

sub output_form
{
      my $msg = shift;

      # output the html header
      print $cgi->header( -type => 'text/html' );

      # print the message if there is one
      print "$msg<br>\n";

      # output the form for the user
      print $cgi->start_html;
      print $cgi->start_form;
      print $cgi->textfield('url');
      print $cgi->br;
      print $cgi->submit( -label => 'Fetch' );
      print $cgi->end_form;
      print $cgi->end_form;
}

#### subroutines
sub fetch {
      my ($url) = @_;
      my $cont;

      $cont = get($url);
      return $cont;
}

# copies text to file

sub cp_to_file {
      my ($text, $to_file) = @_;

      open(OUT, ">" . $to_file);
      print OUT $text;
      close(OUT);
}

# converts html text into plain text; (simplistic approach)

sub plain_text {
      my ($in_text) = @_;
      my $plain;

      ($plain = $in_text) =~ s/<[^>]*>//gs;

      return $plain;
}
0
sdesarAuthor Commented:
I did exactly as it states above and I am receiving an error stating-
500 Internal Server Error..

Could you please inform me what I should do next?

Thank You.
0
sludinCommented:

The 500 error basically means the cgi program is failing.  This could be for many reasons, but it is often directory permissions.  You could try fiddling with the permissions and make certain there are write permissions for the script.  You can always do the 'comment out until is disappears' trick to find the exact perpetrator.

Alternatively you can just send the plain text back like this:

if ( $content ne "" )
{
print $cgi->header( -type => 'text/plain' );
my $plain_text = plain_text( $content );
print $plain_text;

}
else
{
output_form( "Could not load URL: $url<br>" );
}


tell me how it works.




0
sdesarAuthor Commented:
I changed the permissions to chmod 755 on the *.html and *.cgi file

I am still receiving the same error.. any other tips
0
sludinCommented:
Make sure the directory isself has the correct permissions.  You probably do not want to give write permissions to your cgi-bin, so use something like hhtpd/tmp/.  Change your script to read/write from that directory.

Next try brute force.  Start with commenting out all of the main body code.  Leave just:

print "Content-type text/plain\n\n";
print "Test";

The two newlines are vital at the end of the header.

This should being back the text 'Test' in a browser.

Next start bringing in the functionality of the script by uncommenting the code.  

You can also try executing the script from the command line.  When the CGI.pm module prompts you for inputs say url=http://yoursite.com/index.html . The press ^D and the script should ( or should not if there is a problem ) run.  You should see any errors if they are there.

Welcome to the world of CGI debugging.  

Let me know what you find and where it breaks.

-steve
0
sdesarAuthor Commented:
Thanks steve for time and efforts... my script works
YEAPPY!

It was the permissions for the files hhtpd/tmp..etc...

Well, theres a minor problem...
I want to clear the contents in the text field once the *.cgi script because:
If the user enters a wrong URL it should clear the contents of the text box.
In the current script the WRONG  URL remains...
heres my  test.cgi script..

Its the textfield 'url'--- that save the old URL...
how can I have it so its blank...

File : test.cgi


sub output_form
                  {
                  my $msg = shift;

                  # output the html header
                  print $cgi->header( -type => 'text/html' );

                  # print the message if there is one
                  print "$msg<br>\n";

                  # output the form for the user
                  print $cgi->start_html;
                  print $cgi->start_form;
                  print $cgi->textfield('url');
                  print $cgi->br;
                  print $cgi->submit( -label => 'Fetch' );
                  print $cgi->end_form;
                  print $cgi->end_form;
                  }

Awaiting your response.........
0
sludinCommented:
Try this:  Substitute the textfield line with:

print $cgi->textfield( -name=>'url', -value=>'');

This explicitly sets the field to blank.

-steve
                 
0
sdesarAuthor Commented:
I tried your suggestion...but it does not work.
The text field is not blank.
0
sludinCommented:
Her you go:

print $cgi->textfield( -name=>'url', -override=>' ' );
0
sdesarAuthor Commented:
It Worked!!

Thanks Steve.  
0
sludinCommented:
No problem.  Enjoy.
0
sdesarAuthor Commented:
Hi Steve,

How can I keep-
http://www.
in the text box all the time.
0
sludinCommented:
Off the top of my head I would say:

print $cgi->textfield( -name=>'url', -override=>'http://www.' );

If that doesn't work I will look into it.

-steve
0
sdesarAuthor Commented:
Finally the servers are up at my school...I got to try this suggestion but it did not work....could you give me another way to display
http://

Thanks
0
sdesarAuthor Commented:
If there is a link to another document in the one that I fetched....
Is there a way so that when the user clicks of that link within the document, even the 2nd document is directly parsed in ascii also.

Sludin, could you please provide suggestions?
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Web Languages and Standards

From novice to tech pro — start learning today.