asked on

How to refresh Files?

How can I refresh files that are receiving the outputs?

Currently I have to HIT the REFRESH BUTTON everytime I enter a new URL and convert it to ASCII....

How can I refresh the files without having to hit REFRESH?

Here' the code.. url.cgi
#!/usr/bin/perl

##-I/web/public/grad/sdesar
################################################################################################
#This script does the following
#1. file.txt- converts html to ascii
#2. file.html- fetch the html file.
#3. fileParse.txt - Parse the articles, prepositions, etc. to create the keywords to be analyzed.
#4. fileHeader.html- gets the headers from file.html and creates anchors.
#5. stop_words- List of words which are to be parsed.... a, the, and, etc.,
# I got this from http://www.nzdl.org
#
#This file contains the following Routines
#
#1. convert to plain_text
#2. copy HTML file
#3. parse data
#4. get all the headers and place it after the <BODY> tag and make them anchors
#Here's the list of INPUT AND OUTPUT FILES-
#file.txt - ASCII file
#file.html - HTML file
#fileParse.txt- parses articles, prepositions etc
# Input file - file.txt, stop_words
# Output file - fileParse.txt
#fileHeader.html - Gets the Headers from HTML document
# Input file - file.html
# Output file - fileHeader.html
#fileKeys.txt - Displays the 5 most freq. words per parah
# and 10 most freq. words in the entire document.
# Input - fileParse.txt
# Output - fileKeys.txt
#fileKW.html - Displays the 10 mmost frew. words as BOLD in the Parsed file.
# Input - fileParse.txt
# Output - fileKW.html
#keywords.out - Parses the keywords from fileKeys.txt
# Input- fileKeys.txt
# Output- keywords.out
#fileBold.html - Bolds all the words.
# Input- file.html
# Input- keywords.out
# Output- fileBold.html
#find.html - Independent javascript to find keywords.
#
###############################################################################################

use LWP::Simple;
use HTML::Parser;
use CGI;

# require "boldparse.pl";
#require "wrapper.cgi";

my $cgi = new CGI;
my $url = $cgi->param('url');

my $HTTP_ROOT = "/web/public/grad/sdesar/tmp/file.txt";
my $HTTP_ROOT1 = "/web/public/grad/sdesar/tmp/file.html";
my $HTTP_ROOT2 = "/web/public/grad/sdesar/tmp/fileParse.txt";

if ( $url ne "" )
{
$content = fetch( $url );
if ( $content ne "" )
{
print $cgi->header( -type => 'text/plain' );
my $plain_text = plain_text($content);
print $plain_text;
print $content;
my $plain_text = plain_text( $content );

print $plain_text, "\n";

cp_to_file( $plain_text, "$HTTP_ROOT" );
print $cgi->redirect( "./tmp/file.txt" );

##########routine to parse data
print $cgi->header( -type => 'text/plain' );
open KW, 'stop_words';
@kw = map {chop;$_} <KW>;
close KW;

#form RE

$re = join '\b)|(\b','(\b', @kw,'\b)';

open TXT,'./tmp/file.txt';
while(<TXT>){
s/$re//goi;
$contentParse .= $_;
}
print $contentParse, "\n";

cp_to_file($contentParse, "$HTTP_ROOT2");
print $cgi->redirect( "./tmp/fileParse.txt" );

#print "\n\n\n";

##########copy the HTML

print $cgi->header( -type => 'text/plain' );
my $cp_to_file_html = cp_to_file_html($content);
print $cp_to_file_html;
print $content;
my $cp_to_file_html = cp_to_file_html( $content );

print $cp_to_file_html, "\n";

cp_to_file_html( $cp_to_file_html, "$HTTP_ROOT1" );
print $cgi->redirect( "./tmp/file.html" );

###############COPY TO KEYWORDS.out

print $cgi->header( -type => 'text/plain' );

$status=&parsenwrite("./tmp/fileKeys.txt","./tmp/keywords.out");
print $cgi->redirect( "./tmp/keywords.out" );
print $cgi->header( -type => 'text/plain' );
#print "Content-type: text/html\n\n";
if ($status){ ## The Parse'n Write sub-routine was fine
## Now read the html file and bold the keywords
&makebold("./tmp/file.html","./tmp/keywords.out","./tmp/fileBold.html");
print $cgi->redirect( "./tmp/fileBold.html" );
}else{
print "Error during parsewrite\n";
}

}
else
{
output_form( "Could not load URL: $url<br>" );
}
}
else
{
output_form( "Enter URL to fetch" );
}

sub output_form
{
my $msg = shift;

# output the html header
print $cgi->header( -type => 'text/html' );

# print the message if there is one
print "$msg<br>\n";

# output the form for the user
print $cgi->start_html;
print $cgi->start_form;
print "Please enter another URL: ";
print $cgi->textfield( -name=>'url', -value=>'http://www.' );
#print $cgi->textfield('url');
print $cgi->br;
print $cgi->submit( -label => 'Fetch' );
print $cgi->end_form;
print $cgi->end_html;
}

print <<"PrintTag";
<html><head>
<title>CGI-Generated HTML</title>
</head><body>
<H2 align="center">WEB TEXTURIZER</H2>
<HR>
<H2> The following files will be created: <H2>
<H3> Please hit RELOAD to REFRESH these files. <H3>

<UL>

<LI><A HREF="./tmp/file.txt"
TARGET="results">
Text Only Version</A>
<LI><A HREF="./tmp/file.html"
TARGET="results">
HTML Version</A>
<LI><A HREF="./tmp/fileKW.html"
TARGET="results">
KeyWords and BOLD Them-in Parsed File-- Do it in HTML-Headers</A>
<LI><A HREF="./tmp/fileParse.txt"
TARGET="results">
Parsed Version</A>
<LI><A HREF="./tmp/fileHeader.html"
TARGET="results">
HTML along with the Headers that have anchors created </A>
<LI><A HREF="./tmp/find.html"
TARGET="results">
HTML and find the keywords -- JavaScript</A>
<LI><A HREF="./tmp/fileKeys.txt"
TARGET="results">
Finds the most frequent words per Paragraph and Total</A>

<LI><A HREF="./tmp/keywords.out"
TARGET="results">
Keywords in ASCII</A>

<LI><A HREF="./tmp/keywords.html"
TARGET="results">
Keywords in HTML with anchors created Version</A>

<LI><A HREF="./tmp/fileKeywords.html"
TARGET="results">
Finds Keywords and creates the anchorsin BOLD HTML Version</A>

<LI><A HREF="/public/grad/sdesar/wrapper.cgi?f=file.txt">

Text Only Version - file.txt</A>

</UL>
<HR>
</body></html>
PrintTag
#Line above has the magic word that
#makes the browser stop printing
#End of program

#print "Content-type: text/plain \n\n";
#print "TEST";

#### subroutines
sub fetch {
my ($url) = @_;
my $cont;

$cont = get($url);
return $cont;
}

# copies text to file

sub cp_to_file {
my ($text, $to_file) = @_;

open(OUT, ">" . $to_file);
print OUT $text;
close(OUT);
}

# copies file at HTML to a file

sub cp_to_file_html {
my ($text, $to_file) = @_;

open(OUT, ">" . $to_file);
print OUT $text;
return $text;
close(OUT);
}

# converts html text into plain text; (simplistic approach)
sub plain_text {
my ($in_text) = @_;
my $plain;

($plain = $in_text) =~ s/<[^>]*>//gs;

return $plain;
}

##############Clear cache

use CGI;

$query=new CGI;
my $file_name=$query->param('f');
my $file_path="/web/public/grad/sdesar/tmp/";

open(OUT,$file_path.$file_name) || die$!;

print "Content-type: text/html\n\n" if $file_name!~ /\.txt/;
print "Content-type: text/plain\n\n" if $file_name=~ /\.txt/;

print "<meta http-equiv=\"Pragma\" content=\"no-cache\">
<meta http-equiv=\"expires\" content=\"0\">";

while(<OUT>){print $_;}
close(OUT);

##############SUBROUTINE to add anchors to headers
open(FILE, "./tmp/file.html");
# open(FILE, "$ARGV[0]");

# @File = <FILE>;
@File = <FILE>;
$html = join(" ", @File);

close (FILE);

#Match Headers
(@headers) =($html=~m!<H\d>\s*(.*?)\s*</H\d>!isg);

#Convert all headers into named anchors
$html =~ s!(<H\d>\s*)(.*?)(\s*</H\d>)!$1<a name="$2">$2</a>$3!isg;

#Construct links to headers
foreach $header (@headers)
{
$links .= qq(\n<a href="#$header">$header</a><br>\n);

}

#Place links at top of page after <Body> tag
$html =~ s/(<body[^>]*>)/$1$links/i;
print $html;

#Write out new document

cp_to_file_html($html, "./tmp/fileHeader.html");
print "\n\n\n";

#############SUBROUTINE FOR KEYWORDS
sub keywords{
my $file = shift;
open FILE,"<$file" or die "can't open $file : $!";
my %wc=();
my %seen = ();
my %top;
my @words;
my $paragraphs='';
local $/='';
my @paragraphs = <FILE>;
close FILE;
for( @paragraphs ){
while( /(\w['\w-]*)/g ){
$seen{lc $1}++;
}
}
@top{(sort {$seen{$b} <=> $seen{$a} } keys %seen)[0..9]} = ();
for( @paragraphs ){
%wc = ();
if( @words = grep {exists $top{lc $_} && !$wc{lc $_}++} /(\w['\w-]*)/g ){
for my $w ( @words ){ s/\b(\Q$w\E)\b/<b>$1<\/b>/gi; }
$paragraphs .= join ' ',"<h1>",@words,"</h1>:\n$_";
}
}
$paragraphs=~s/$/<br>/gm;
return $paragraphs;
}

$paragraphs=keywords('./tmp/fileParse.txt');
print "$_:$paragraphs";

open FILE2, ">./tmp/fileKW.html" or die "can't open fileKW because $!";
print FILE2 keywords("./tmp/fileParse.txt");
close FILE2;

#########################Routine to count the words per Parah and total

open IN,"<./tmp/fileParse.txt" or die "can't open fileParse.txt:$!";
open OUT,">./tmp/fileKeys.txt" or die "can't open fileKeys.txt:$!";
{local $/='';
while( <IN> ){
%wc = ();
while( /(\w['\w-]*)/g ){
$seen{lc $1}++;
$wc{lc $1}++;
}
print OUT "paragraph $.\n";
for( (sort {$wc{$b} <=> $wc{$a} } keys %wc)[0..4] ){
print OUT "$_ : $wc{$_}\n";
}
}
}
print OUT "total\n";
for( (sort {$seen{$b} <=> $seen{$a} } keys %seen)[0..9] ){
printf OUT "%5d %s\n", $seen{$_}, $_;
}

################ROUTINE FOR BOLDFACE THE KEYWORDS IN HTML FILE

# require "boldparse.pl";
sub parsenwrite{
($filekeys,$keywords)=@_;

open (FILEKEYS,$filekeys) || die "can't open $filekeys: $!\n";

$ctr=0;
while ($line=<FILEKEYS>){
$ctr++;
chomp($line); ## Remove the \n char
## Ignore lines with paragraph 1, paragraph 2 etc & ...
## line having only a ":" in it.
next if $line=~ /^paragraph\s+\d+/ || $line=~ /^:$/;

## Check for lines which have white spaces followed by
## numbers and then have a word. Eg. 20 information
if ($line=~ /\s+\d+\s+(.*)/){
$keywords{$1}=1;
next; ## Go for the next line
}

## All remaining lines WILL have the foll format
## word : number
@tmp=split(/:/,$line);

if ($#tmp>0){ ## The line has the above format.
$tmp[0]=~ s/\s+//g; ## Squeeze out white spaces
$keywords{$tmp[0]}=1;
}
}
close(FILEKEYS);

## We are using an associative array to eliminate any
## duplicate keywords we might have in the input text file.
open (KEYWORDS,">$keywords") || die "can't open $keywords: $!\n";
foreach(sort keys %keywords){
print KEYWORDS $_,"\n"; ## Write to the keyword output file
print "\n\n\n";

}
close(KEYWORDS);

return 1;

}
#######################CREATE ANCHORS to the keywords##################
# This uses 4 files:
open(KI, "<./tmp/keywords.out") or die; # simple keywords, one per line
open(KO, ">./tmp/keywords.html") or die; # The htmlized keywords
open(AI, "<./tmp/file.html") or die; # The original HTML document
open(AO, ">./tmp/fileKeywords.html") or die; # The bold/tagged HTML document

@keywords = <KI>; # grab all the keywords
chomp @keywords; chomp @keywords; # Remove linefeeds

# Make sure keywords are unique. I assume only 1 kw per document is needed
@keywords = grep { !$seen{$_}++ } @keywords;

print KO<<EOF; # This is the start of the keywords.html doc
<HTML>
<HEAD>
<style type="text/css">
A {text-decoration:none}
</style>
</head>
<title>This is the Keywords Document</title>
EOF
undef $/; # turn of line-at-a-time processing, and suck up whole files
# Assumption: You have enough RAM to load in fileKeywords.html into memory.

($head,$_) = split /<BODY/i, <AI>; # read in HTML.
# Strip off everything before body tag, since we can't manipulate it

foreach $k (@keywords)
{
$k =~ s/\s//g; # No whitespace allowed in keyword (otherwise, need to
# mess around with the link -- it can't have spaces.)
print KO "<A HREF='http://jbh3-1.csci.csusb.edu/public/grad/sdesar/tmp/fileKeywords.html#$k' target=defsbox>$k</A><BR>\n"; # add outbound link
s!$k!<A NAME='$k'><B>$k</B></A>!; # Create inbound link
# I assume that none of the keywords are subsets of the other keywords..
}

print AO "$head<BODY$_";

I think the problem is with above routine... CREATE ANCHORS to the KEYWORDS... the file - keywords.html is NOT being updated...

guadalupe

I think you've gotten low response on this one because it is not unclear what you want. One idea is that you force the browser to refresh the page automatically every x seconds like this:

<META HTTP-EQUIV="Refresh" CONTENT="300">

This will do it every 300 seconds (5 min)

If this is not what you want try and explain a little more and I'll see if I can help...

sdesar

ASKER

Oh .. good..
But I want the files to be refreshed in 1-5 seconds.

ozo

Change the "300" to "5"

sdesar

ASKER

Where should I place this in the above
script?

<META HTTP-EQUIV="Refresh" CONTENT="5">

maneshr

ok heres what you need to do.

first before your click the fetch button, clear your disk and memory cache.

alternately what you can do is... (assuming you are using Netscape)

1 - fetch a URL.
2 - when you get the results, move your mouse pointer over the hyperlink and right click.
select "Open in new window".

Now you will have 2 browser windows open. one with the Web texturizer interface and other
wil the actual file.

3 - now fetch another URL
4 - when you get the results, just go to the other page, keep the shift key pressed and
click on the reload icon of your browser.

if you see the contents of the page changing, that means the reload is fine, its just the cache that is giving you the problem.

guadalupe

Or put the tag:

<META HTTP-EQUIV="Refresh" CONTENT="5">

between the head tags like this:

<head>

<META HTTP-EQUIV="Refresh" CONTENT="5">

<title>Title</title>

</head>

sdesar

ASKER

Thanks for the suggestions but nothing seems to work .... the data in my keywords.html file is still from the previous url ... eventhoush the rest of the files -- fileKeyword.html and keywords.txt are updated.

I am not sure why is that?

sdesar

ASKER

Edited text of question.

ASKER CERTIFIED SOLUTION

logique

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

sdesar

ASKER

I tried your suggestion...Nothing seems to work... I think there maybe a Bug with the last part of the above script- ie create anchors to the keywords...
you can check out the behavior at http://jbh3-1.csci.csusb.edu/public/grad/sdesar/url_bold.cgi -- check the keywords ascii and keywords with anchors files.... once you enter 2 different urls.
and check out the behavior...

the files have different data... and they should have same keywords only difference is keywords.html has anchors created.

the script can be view at http://jbh3-1.csci.csusb.edu/public/grad/sdesar/url_bold.pl

awaiting a response
Thanks