[Webinar] Streamline your web hosting managementRegister Today

x
?
Solved

How to refresh Files?

Posted on 2000-02-03
10
Medium Priority
?
236 Views
Last Modified: 2010-03-05
How can I refresh files that are receiving the outputs?

Currently I have to HIT the REFRESH BUTTON everytime I enter a new URL and convert it to ASCII....

How can I refresh the files without having to hit REFRESH?


Here' the  code.. url.cgi
#!/usr/bin/perl

##-I/web/public/grad/sdesar
################################################################################################
#This script does the following
#1. file.txt- converts html to ascii
#2. file.html- fetch the html file.
#3. fileParse.txt - Parse the articles, prepositions, etc. to create the keywords to be analyzed.
#4. fileHeader.html- gets the headers from file.html and creates anchors.
#5. stop_words- List of words which are to be  parsed.... a, the, and, etc.,
#   I got this from http://www.nzdl.org
#
#This file contains the following Routines
#
#1. convert to plain_text
#2. copy HTML file
#3. parse data
#4. get all the headers and place it after the <BODY> tag and make them anchors
#Here's the list of INPUT AND OUTPUT FILES-
#file.txt - ASCII file
#file.html - HTML file
#fileParse.txt-  parses articles, prepositions etc
#                Input file - file.txt, stop_words
#                Output file - fileParse.txt
#fileHeader.html - Gets the Headers from HTML document
#               Input file - file.html
#                Output file - fileHeader.html
#fileKeys.txt    - Displays the 5 most freq. words per parah
#                  and 10 most freq. words in the entire document.
#                  Input - fileParse.txt
#                  Output - fileKeys.txt
#fileKW.html    - Displays the 10 mmost frew. words as BOLD in the Parsed file.
#                 Input - fileParse.txt
#                 Output - fileKW.html
#keywords.out - Parses the keywords from fileKeys.txt
#              Input-   fileKeys.txt
#               Output-  keywords.out
#fileBold.html - Bolds all the words.
#                Input- file.html
#                Input- keywords.out
#                Output- fileBold.html
#find.html       - Independent javascript to find keywords.
#
###############################################################################################

use LWP::Simple;
use HTML::Parser;
use CGI;

# require "boldparse.pl";
#require "wrapper.cgi";

my $cgi = new CGI;
my $url = $cgi->param('url');

my $HTTP_ROOT = "/web/public/grad/sdesar/tmp/file.txt";
my $HTTP_ROOT1 = "/web/public/grad/sdesar/tmp/file.html";
my $HTTP_ROOT2 = "/web/public/grad/sdesar/tmp/fileParse.txt";

if ( $url ne "" )
{
$content = fetch( $url );
if ( $content ne "" )
{
print $cgi->header( -type => 'text/plain' );
my $plain_text = plain_text($content);
print $plain_text;
print $content;
my $plain_text = plain_text( $content );

print $plain_text, "\n";

cp_to_file( $plain_text, "$HTTP_ROOT" );
print $cgi->redirect( "./tmp/file.txt" );

##########routine to parse data
print $cgi->header( -type => 'text/plain' );
open KW, 'stop_words';
@kw = map {chop;$_} <KW>;
close KW;

#form RE

$re = join '\b)|(\b','(\b', @kw,'\b)';

open TXT,'./tmp/file.txt';
while(<TXT>){
  s/$re//goi;
  $contentParse .= $_;
}
print $contentParse, "\n";


cp_to_file($contentParse, "$HTTP_ROOT2");
print $cgi->redirect( "./tmp/fileParse.txt" );

#print "\n\n\n";



##########copy the HTML

print $cgi->header( -type => 'text/plain' );
my $cp_to_file_html = cp_to_file_html($content);
print $cp_to_file_html;
print $content;
my $cp_to_file_html = cp_to_file_html( $content );

print $cp_to_file_html, "\n";

cp_to_file_html( $cp_to_file_html, "$HTTP_ROOT1" );
print $cgi->redirect( "./tmp/file.html" );

###############COPY TO KEYWORDS.out

print $cgi->header( -type => 'text/plain' );

$status=&parsenwrite("./tmp/fileKeys.txt","./tmp/keywords.out");
print $cgi->redirect( "./tmp/keywords.out" );
print $cgi->header( -type => 'text/plain' );
#print "Content-type: text/html\n\n";
if ($status){   ##  The Parse'n Write sub-routine was fine
        ##  Now read the html file and bold the keywords
        &makebold("./tmp/file.html","./tmp/keywords.out","./tmp/fileBold.html");
print $cgi->redirect( "./tmp/fileBold.html" );
}else{
        print "Error during parsewrite\n";
}




}
else
{
output_form( "Could not load URL: $url<br>" );
}
}
else
{
output_form( "Enter URL to fetch" );
}

sub output_form
{
my $msg = shift;

# output the html header
print $cgi->header( -type => 'text/html' );

# print the message if there is one
print "$msg<br>\n";

# output the form for the user
print $cgi->start_html;
print $cgi->start_form;
print "Please enter another URL:  ";
print $cgi->textfield( -name=>'url', -value=>'http://www.' );
#print $cgi->textfield('url');
print $cgi->br;
print $cgi->submit( -label => 'Fetch' );
print $cgi->end_form;
print $cgi->end_html;
}

print <<"PrintTag";
<html><head>
<title>CGI-Generated HTML</title>
</head><body>
<H2 align="center">WEB TEXTURIZER</H2>
<HR>
<H2> The  following files will be created: <H2>
<H3> Please hit RELOAD to REFRESH these files. <H3>


<UL>

<LI><A HREF="./tmp/file.txt"
               TARGET="results">
               Text Only Version</A>
        <LI><A HREF="./tmp/file.html"
               TARGET="results">
               HTML Version</A>
        <LI><A HREF="./tmp/fileKW.html"
               TARGET="results">
               KeyWords and BOLD Them-in Parsed File-- Do it in HTML-Headers</A>
        <LI><A HREF="./tmp/fileParse.txt"
               TARGET="results">
               Parsed  Version</A>
        <LI><A HREF="./tmp/fileHeader.html"
               TARGET="results">
              HTML  along with the Headers that have anchors created </A>
        <LI><A HREF="./tmp/find.html"
               TARGET="results">
               HTML  and find the keywords  -- JavaScript</A>
        <LI><A HREF="./tmp/fileKeys.txt"
               TARGET="results">
               Finds the most frequent words per Paragraph and Total</A>


        <LI><A HREF="./tmp/keywords.out"
               TARGET="results">
                Keywords in ASCII</A>

        <LI><A HREF="./tmp/keywords.html"
               TARGET="results">
                Keywords in   HTML  with anchors created Version</A>


        <LI><A HREF="./tmp/fileKeywords.html"
               TARGET="results">
                Finds Keywords  and creates the anchorsin BOLD HTML Version</A>



<LI><A HREF="/public/grad/sdesar/wrapper.cgi?f=file.txt">

               Text Only Version - file.txt</A>


</UL>
<HR>
</body></html>
PrintTag
#Line above has the magic word that
#makes the browser stop printing
#End of program

#print "Content-type: text/plain \n\n";
#print "TEST";




#### subroutines
sub fetch {
my ($url) = @_;
my $cont;

$cont = get($url);
return $cont;
}

# copies text to file

sub cp_to_file {
my ($text, $to_file) = @_;

open(OUT, ">" . $to_file);
print OUT $text;
close(OUT);
}

# copies file at HTML to a file

sub cp_to_file_html {
my ($text, $to_file) = @_;

open(OUT, ">" . $to_file);
print OUT $text;
return $text;
close(OUT);
}


# converts html text into plain text; (simplistic approach)
sub plain_text {
my ($in_text) = @_;
my $plain;

($plain = $in_text) =~ s/<[^>]*>//gs;

return $plain;
}

##############Clear cache

use CGI;

$query=new CGI;
my $file_name=$query->param('f');
my $file_path="/web/public/grad/sdesar/tmp/";

open(OUT,$file_path.$file_name) || die$!;


print "Content-type: text/html\n\n" if $file_name!~ /\.txt/;
print "Content-type: text/plain\n\n" if $file_name=~ /\.txt/;

print "<meta http-equiv=\"Pragma\" content=\"no-cache\">
<meta http-equiv=\"expires\" content=\"0\">";

while(<OUT>){print $_;}
close(OUT);




##############SUBROUTINE to add anchors to headers
open(FILE, "./tmp/file.html");
                   #   open(FILE, "$ARGV[0]");

                   #   @File = <FILE>;
                      @File = <FILE>;
                      $html = join(" ", @File);

                      close (FILE);

                      #Match Headers
                      (@headers) =($html=~m!<H\d>\s*(.*?)\s*</H\d>!isg);


                      #Convert all headers into named anchors
                      $html =~ s!(<H\d>\s*)(.*?)(\s*</H\d>)!$1<a name="$2">$2</a>$3!isg;

                      #Construct links to headers
                      foreach $header (@headers)
                      {
                      $links .= qq(\n<a href="#$header">$header</a><br>\n);

                      }

                      #Place links at top of page after <Body> tag
                      $html =~ s/(<body[^>]*>)/$1$links/i;
                      print $html;

                      #Write out new document

                      cp_to_file_html($html, "./tmp/fileHeader.html");
                      print "\n\n\n";


#############SUBROUTINE FOR KEYWORDS
sub keywords{
                         my $file = shift;
                         open FILE,"<$file" or die "can't open $file : $!";
                         my %wc=();
                         my %seen = ();
                         my %top;
                         my @words;
                         my $paragraphs='';
                         local $/='';
                         my @paragraphs = <FILE>;
                         close FILE;
                         for( @paragraphs ){
                             while( /(\w['\w-]*)/g ){
                                 $seen{lc $1}++;
                             }
                         }
                         @top{(sort {$seen{$b} <=> $seen{$a} } keys %seen)[0..9]} = ();
                         for( @paragraphs ){
                             %wc = ();
                             if( @words = grep {exists $top{lc $_} && !$wc{lc $_}++} /(\w['\w-]*)/g ){
                                 for my $w ( @words ){ s/\b(\Q$w\E)\b/<b>$1<\/b>/gi; }
                                 $paragraphs .= join ' ',"<h1>",@words,"</h1>:\n$_";
                             }
                         }
                         $paragraphs=~s/$/<br>/gm;
                         return $paragraphs;
                     }


$paragraphs=keywords('./tmp/fileParse.txt');
print "$_:$paragraphs";

open FILE2, ">./tmp/fileKW.html" or die "can't open fileKW because $!";
print FILE2 keywords("./tmp/fileParse.txt");
close FILE2;

#########################Routine to count the words per Parah and total

               open IN,"<./tmp/fileParse.txt" or die "can't open fileParse.txt:$!";
               open OUT,">./tmp/fileKeys.txt" or die "can't open fileKeys.txt:$!";
               {local $/='';
                  while( <IN> ){
                     %wc = ();
                     while( /(\w['\w-]*)/g ){
                         $seen{lc $1}++;
                         $wc{lc $1}++;
                     }
                     print OUT "paragraph $.\n";
                     for( (sort {$wc{$b} <=> $wc{$a} } keys %wc)[0..4] ){
                         print OUT "$_ : $wc{$_}\n";
                     }
                  }
               }
               print OUT "total\n";
               for( (sort {$seen{$b} <=> $seen{$a} } keys %seen)[0..9] ){
                  printf OUT "%5d %s\n", $seen{$_}, $_;
               }

################ROUTINE FOR BOLDFACE THE KEYWORDS IN HTML FILE

# require "boldparse.pl";
sub parsenwrite{
        ($filekeys,$keywords)=@_;

        open (FILEKEYS,$filekeys) || die "can't open $filekeys: $!\n";

        $ctr=0;
        while ($line=<FILEKEYS>){
                $ctr++;
                chomp($line);   ##      Remove the \n char
                ##      Ignore lines with paragraph 1, paragraph 2 etc & ...
                ##      line having only a ":" in it.
                next if $line=~ /^paragraph\s+\d+/ || $line=~ /^:$/;

                ##      Check for lines which have white spaces followed by
                ##      numbers and then have a word. Eg.    20 information
                if ($line=~ /\s+\d+\s+(.*)/){
                        $keywords{$1}=1;
                        next;   ##      Go for the next line
                }

                ##      All remaining lines WILL have the foll format
                ##      word : number
                @tmp=split(/:/,$line);

                if ($#tmp>0){   ##      The line has the above format.
                        $tmp[0]=~ s/\s+//g;     ##      Squeeze out white spaces
                        $keywords{$tmp[0]}=1;
                }
        }
        close(FILEKEYS);

        ##      We are using an associative array to eliminate any
        ##      duplicate keywords we might have in the input text file.
        open (KEYWORDS,">$keywords") || die "can't open $keywords: $!\n";
        foreach(sort keys %keywords){
             print KEYWORDS $_,"\n"; ##      Write to the keyword output file
             print "\n\n\n";


        }
        close(KEYWORDS);

        return 1;

}
#######################CREATE ANCHORS to the keywords##################
# This uses 4 files:
                open(KI, "<./tmp/keywords.out")   or die; # simple keywords, one per line
                open(KO, ">./tmp/keywords.html")  or die; # The htmlized keywords
                open(AI, "<./tmp/file.html") or die; # The original HTML document
                open(AO, ">./tmp/fileKeywords.html")        or die; # The bold/tagged HTML document

                @keywords = <KI>; # grab all the keywords
                chomp @keywords; chomp @keywords;  # Remove linefeeds

                # Make sure keywords are unique. I assume only 1 kw per document is needed
                @keywords = grep { !$seen{$_}++ } @keywords;

                print KO<<EOF; # This is the start of the keywords.html doc
                <HTML>
                <HEAD>
<style type="text/css">
A {text-decoration:none}
</style>
</head>
                  <title>This is the Keywords Document</title>
EOF
                undef $/; # turn of line-at-a-time processing, and suck up whole files
                # Assumption: You have enough RAM to load in fileKeywords.html into memory.

                ($head,$_) = split /<BODY/i, <AI>; # read in HTML.
                # Strip off everything before body tag, since we can't manipulate it

                foreach $k (@keywords)
                {
                  $k =~ s/\s//g; # No whitespace allowed in keyword (otherwise, need to
                  # mess around with the link -- it can't have spaces.)
                  print KO "<A HREF='http://jbh3-1.csci.csusb.edu/public/grad/sdesar/tmp/fileKeywords.html#$k' target=defsbox>$k</A><BR>\n"; # add outbound link
                  s!$k!<A NAME='$k'><B>$k</B></A>!; # Create inbound link
                # I assume that none of the keywords are subsets of the other keywords..
                }

                print AO "$head<BODY$_";


I think the problem is with above routine... CREATE ANCHORS to the KEYWORDS... the file - keywords.html is NOT being updated...
0
Comment
Question by:sdesar
10 Comments
 
LVL 3

Expert Comment

by:guadalupe
ID: 2491329
I think you've gotten low response on this one because it is not unclear what you want.  One idea is that you force the browser to refresh the page automatically every x seconds like this:

<META HTTP-EQUIV="Refresh" CONTENT="300">

This will do it every 300 seconds (5 min)

If this is not what you want try and explain a little more and I'll see if I can help...
0
 

Author Comment

by:sdesar
ID: 2491431
Oh .. good..
But I want the files to be refreshed in 1-5 seconds.
0
 
LVL 85

Expert Comment

by:ozo
ID: 2491460
Change the "300" to "5"
0
Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

 

Author Comment

by:sdesar
ID: 2491476
Where should I place this in the above
script?

<META HTTP-EQUIV="Refresh" CONTENT="5">
0
 
LVL 16

Expert Comment

by:maneshr
ID: 2491497
ok heres what you need to do.

    first before your click the fetch button, clear your disk and memory cache.

    alternately what you can do is... (assuming you are using Netscape)

    1 - fetch a URL.
    2 - when you get the results, move your mouse pointer over the hyperlink and right click.
    select "Open in new window".

    Now you will have 2 browser windows open. one with the Web texturizer interface and other
    wil the actual file.

    3 - now fetch another URL
    4 - when you get the results, just go to the other page, keep the shift key pressed and
    click on the reload icon of your browser.

    if you see the contents of the page changing, that means the reload is fine, its just the    cache that is giving you the problem.

0
 
LVL 3

Expert Comment

by:guadalupe
ID: 2491567
Or put the tag:

<META HTTP-EQUIV="Refresh" CONTENT="5">


between the head tags like this:

<head>

<META HTTP-EQUIV="Refresh" CONTENT="5">

<title>Title</title>

</head>

0
 

Author Comment

by:sdesar
ID: 2512480
Thanks for the suggestions but nothing seems to work .... the data in my keywords.html file is still from the previous url ... eventhoush the rest of the files -- fileKeyword.html and keywords.txt are updated.

I am not sure why is that?
0
 

Author Comment

by:sdesar
ID: 2512501
Edited text of question.
0
 

Accepted Solution

by:
logique earned 20 total points
ID: 2543169
Add <meta http-equiv=refresh content=5> and <meta http-equiv=plasma content=no-cache> to prevent cache-ing your old data in to the local computer.
0
 

Author Comment

by:sdesar
ID: 2543246
I tried your suggestion...Nothing seems to work... I think there maybe a Bug with the last part of the above script- ie create anchors to the keywords...
you can check out the behavior at   http://jbh3-1.csci.csusb.edu/public/grad/sdesar/url_bold.cgi  -- check the keywords ascii and keywords with anchors files.... once you enter 2 different urls.
and check out the behavior...


the files have different data... and they should have same keywords only difference is keywords.html has anchors created.

the script can be view at http://jbh3-1.csci.csusb.edu/public/grad/sdesar/url_bold.pl

awaiting a response
Thanks
0

Featured Post

Receive 1:1 tech help

Solve your biggest tech problems alongside global tech experts with 1:1 help.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Article by: Tammy
MySQLTuner is a script written in Perl that allows you to review a MySQL installation quickly and make adjustments to increase performance and stability. The current configuration variables and status data is retrieved and presented in a brief forma…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans
Suggested Courses

607 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question