sdesar
asked on
How to refresh Files?
How can I refresh files that are receiving the outputs?
Currently I have to HIT the REFRESH BUTTON everytime I enter a new URL and convert it to ASCII....
How can I refresh the files without having to hit REFRESH?
Here' the code.. url.cgi
#!/usr/bin/perl
##-I/web/public/grad/sdesa r
########################## ########## ########## ########## ########## ########## ########## ##########
#This script does the following
#1. file.txt- converts html to ascii
#2. file.html- fetch the html file.
#3. fileParse.txt - Parse the articles, prepositions, etc. to create the keywords to be analyzed.
#4. fileHeader.html- gets the headers from file.html and creates anchors.
#5. stop_words- List of words which are to be parsed.... a, the, and, etc.,
# I got this from http://www.nzdl.org
#
#This file contains the following Routines
#
#1. convert to plain_text
#2. copy HTML file
#3. parse data
#4. get all the headers and place it after the <BODY> tag and make them anchors
#Here's the list of INPUT AND OUTPUT FILES-
#file.txt - ASCII file
#file.html - HTML file
#fileParse.txt- parses articles, prepositions etc
# Input file - file.txt, stop_words
# Output file - fileParse.txt
#fileHeader.html - Gets the Headers from HTML document
# Input file - file.html
# Output file - fileHeader.html
#fileKeys.txt - Displays the 5 most freq. words per parah
# and 10 most freq. words in the entire document.
# Input - fileParse.txt
# Output - fileKeys.txt
#fileKW.html - Displays the 10 mmost frew. words as BOLD in the Parsed file.
# Input - fileParse.txt
# Output - fileKW.html
#keywords.out - Parses the keywords from fileKeys.txt
# Input- fileKeys.txt
# Output- keywords.out
#fileBold.html - Bolds all the words.
# Input- file.html
# Input- keywords.out
# Output- fileBold.html
#find.html - Independent javascript to find keywords.
#
########################## ########## ########## ########## ########## ########## ########## #########
use LWP::Simple;
use HTML::Parser;
use CGI;
# require "boldparse.pl";
#require "wrapper.cgi";
my $cgi = new CGI;
my $url = $cgi->param('url');
my $HTTP_ROOT = "/web/public/grad/sdesar/t mp/file.tx t";
my $HTTP_ROOT1 = "/web/public/grad/sdesar/t mp/file.ht ml";
my $HTTP_ROOT2 = "/web/public/grad/sdesar/t mp/filePar se.txt";
if ( $url ne "" )
{
$content = fetch( $url );
if ( $content ne "" )
{
print $cgi->header( -type => 'text/plain' );
my $plain_text = plain_text($content);
print $plain_text;
print $content;
my $plain_text = plain_text( $content );
print $plain_text, "\n";
cp_to_file( $plain_text, "$HTTP_ROOT" );
print $cgi->redirect( "./tmp/file.txt" );
##########routine to parse data
print $cgi->header( -type => 'text/plain' );
open KW, 'stop_words';
@kw = map {chop;$_} <KW>;
close KW;
#form RE
$re = join '\b)|(\b','(\b', @kw,'\b)';
open TXT,'./tmp/file.txt';
while(<TXT>){
s/$re//goi;
$contentParse .= $_;
}
print $contentParse, "\n";
cp_to_file($contentParse, "$HTTP_ROOT2");
print $cgi->redirect( "./tmp/fileParse.txt" );
#print "\n\n\n";
##########copy the HTML
print $cgi->header( -type => 'text/plain' );
my $cp_to_file_html = cp_to_file_html($content);
print $cp_to_file_html;
print $content;
my $cp_to_file_html = cp_to_file_html( $content );
print $cp_to_file_html, "\n";
cp_to_file_html( $cp_to_file_html, "$HTTP_ROOT1" );
print $cgi->redirect( "./tmp/file.html" );
###############COPY TO KEYWORDS.out
print $cgi->header( -type => 'text/plain' );
$status=&parsenwrite("./tm p/fileKeys .txt","./t mp/keyword s.out");
print $cgi->redirect( "./tmp/keywords.out" );
print $cgi->header( -type => 'text/plain' );
#print "Content-type: text/html\n\n";
if ($status){ ## The Parse'n Write sub-routine was fine
## Now read the html file and bold the keywords
&makebold("./tmp/file.html ","./tmp/k eywords.ou t","./tmp/ fileBold.h tml");
print $cgi->redirect( "./tmp/fileBold.html" );
}else{
print "Error during parsewrite\n";
}
}
else
{
output_form( "Could not load URL: $url<br>" );
}
}
else
{
output_form( "Enter URL to fetch" );
}
sub output_form
{
my $msg = shift;
# output the html header
print $cgi->header( -type => 'text/html' );
# print the message if there is one
print "$msg<br>\n";
# output the form for the user
print $cgi->start_html;
print $cgi->start_form;
print "Please enter another URL: ";
print $cgi->textfield( -name=>'url', -value=>'http://www.' );
#print $cgi->textfield('url');
print $cgi->br;
print $cgi->submit( -label => 'Fetch' );
print $cgi->end_form;
print $cgi->end_html;
}
print <<"PrintTag";
<html><head>
<title>CGI-Generated HTML</title>
</head><body>
<H2 align="center">WEB TEXTURIZER</H2>
<HR>
<H2> The following files will be created: <H2>
<H3> Please hit RELOAD to REFRESH these files. <H3>
<UL>
<LI><A HREF="./tmp/file.txt"
TARGET="results">
Text Only Version</A>
<LI><A HREF="./tmp/file.html"
TARGET="results">
HTML Version</A>
<LI><A HREF="./tmp/fileKW.html"
TARGET="results">
KeyWords and BOLD Them-in Parsed File-- Do it in HTML-Headers</A>
<LI><A HREF="./tmp/fileParse.txt"
TARGET="results">
Parsed Version</A>
<LI><A HREF="./tmp/fileHeader.htm l"
TARGET="results">
HTML along with the Headers that have anchors created </A>
<LI><A HREF="./tmp/find.html"
TARGET="results">
HTML and find the keywords -- JavaScript</A>
<LI><A HREF="./tmp/fileKeys.txt"
TARGET="results">
Finds the most frequent words per Paragraph and Total</A>
<LI><A HREF="./tmp/keywords.out"
TARGET="results">
Keywords in ASCII</A>
<LI><A HREF="./tmp/keywords.html"
TARGET="results">
Keywords in HTML with anchors created Version</A>
<LI><A HREF="./tmp/fileKeywords.h tml"
TARGET="results">
Finds Keywords and creates the anchorsin BOLD HTML Version</A>
<LI><A HREF="/public/grad/sdesar/ wrapper.cg i?f=file.t xt">
Text Only Version - file.txt</A>
</UL>
<HR>
</body></html>
PrintTag
#Line above has the magic word that
#makes the browser stop printing
#End of program
#print "Content-type: text/plain \n\n";
#print "TEST";
#### subroutines
sub fetch {
my ($url) = @_;
my $cont;
$cont = get($url);
return $cont;
}
# copies text to file
sub cp_to_file {
my ($text, $to_file) = @_;
open(OUT, ">" . $to_file);
print OUT $text;
close(OUT);
}
# copies file at HTML to a file
sub cp_to_file_html {
my ($text, $to_file) = @_;
open(OUT, ">" . $to_file);
print OUT $text;
return $text;
close(OUT);
}
# converts html text into plain text; (simplistic approach)
sub plain_text {
my ($in_text) = @_;
my $plain;
($plain = $in_text) =~ s/<[^>]*>//gs;
return $plain;
}
##############Clear cache
use CGI;
$query=new CGI;
my $file_name=$query->param(' f');
my $file_path="/web/public/gr ad/sdesar/ tmp/";
open(OUT,$file_path.$file_ name) || die$!;
print "Content-type: text/html\n\n" if $file_name!~ /\.txt/;
print "Content-type: text/plain\n\n" if $file_name=~ /\.txt/;
print "<meta http-equiv=\"Pragma\" content=\"no-cache\">
<meta http-equiv=\"expires\" content=\"0\">";
while(<OUT>){print $_;}
close(OUT);
##############SUBROUTINE to add anchors to headers
open(FILE, "./tmp/file.html");
# open(FILE, "$ARGV[0]");
# @File = <FILE>;
@File = <FILE>;
$html = join(" ", @File);
close (FILE);
#Match Headers
(@headers) =($html=~m!<H\d>\s*(.*?)\s *</H\d>!is g);
#Convert all headers into named anchors
$html =~ s!(<H\d>\s*)(.*?)(\s*</H\d >)!$1<a name="$2">$2</a>$3!isg;
#Construct links to headers
foreach $header (@headers)
{
$links .= qq(\n<a href="#$header">$header</a ><br>\n);
}
#Place links at top of page after <Body> tag
$html =~ s/(<body[^>]*>)/$1$links/i ;
print $html;
#Write out new document
cp_to_file_html($html, "./tmp/fileHeader.html");
print "\n\n\n";
#############SUBROUTINE FOR KEYWORDS
sub keywords{
my $file = shift;
open FILE,"<$file" or die "can't open $file : $!";
my %wc=();
my %seen = ();
my %top;
my @words;
my $paragraphs='';
local $/='';
my @paragraphs = <FILE>;
close FILE;
for( @paragraphs ){
while( /(\w['\w-]*)/g ){
$seen{lc $1}++;
}
}
@top{(sort {$seen{$b} <=> $seen{$a} } keys %seen)[0..9]} = ();
for( @paragraphs ){
%wc = ();
if( @words = grep {exists $top{lc $_} && !$wc{lc $_}++} /(\w['\w-]*)/g ){
for my $w ( @words ){ s/\b(\Q$w\E)\b/<b>$1<\/b>/ gi; }
$paragraphs .= join ' ',"<h1>",@words,"</h1>:\n$ _";
}
}
$paragraphs=~s/$/<br>/gm;
return $paragraphs;
}
$paragraphs=keywords('./tm p/filePars e.txt');
print "$_:$paragraphs";
open FILE2, ">./tmp/fileKW.html" or die "can't open fileKW because $!";
print FILE2 keywords("./tmp/fileParse. txt");
close FILE2;
#########################R outine to count the words per Parah and total
open IN,"<./tmp/fileParse.txt" or die "can't open fileParse.txt:$!";
open OUT,">./tmp/fileKeys.txt" or die "can't open fileKeys.txt:$!";
{local $/='';
while( <IN> ){
%wc = ();
while( /(\w['\w-]*)/g ){
$seen{lc $1}++;
$wc{lc $1}++;
}
print OUT "paragraph $.\n";
for( (sort {$wc{$b} <=> $wc{$a} } keys %wc)[0..4] ){
print OUT "$_ : $wc{$_}\n";
}
}
}
print OUT "total\n";
for( (sort {$seen{$b} <=> $seen{$a} } keys %seen)[0..9] ){
printf OUT "%5d %s\n", $seen{$_}, $_;
}
################ROUTINE FOR BOLDFACE THE KEYWORDS IN HTML FILE
# require "boldparse.pl";
sub parsenwrite{
($filekeys,$keywords)=@_;
open (FILEKEYS,$filekeys) || die "can't open $filekeys: $!\n";
$ctr=0;
while ($line=<FILEKEYS>){
$ctr++;
chomp($line); ## Remove the \n char
## Ignore lines with paragraph 1, paragraph 2 etc & ...
## line having only a ":" in it.
next if $line=~ /^paragraph\s+\d+/ || $line=~ /^:$/;
## Check for lines which have white spaces followed by
## numbers and then have a word. Eg. 20 information
if ($line=~ /\s+\d+\s+(.*)/){
$keywords{$1}=1;
next; ## Go for the next line
}
## All remaining lines WILL have the foll format
## word : number
@tmp=split(/:/,$line);
if ($#tmp>0){ ## The line has the above format.
$tmp[0]=~ s/\s+//g; ## Squeeze out white spaces
$keywords{$tmp[0]}=1;
}
}
close(FILEKEYS);
## We are using an associative array to eliminate any
## duplicate keywords we might have in the input text file.
open (KEYWORDS,">$keywords") || die "can't open $keywords: $!\n";
foreach(sort keys %keywords){
print KEYWORDS $_,"\n"; ## Write to the keyword output file
print "\n\n\n";
}
close(KEYWORDS);
return 1;
}
#######################CRE ATE ANCHORS to the keywords##################
# This uses 4 files:
open(KI, "<./tmp/keywords.out") or die; # simple keywords, one per line
open(KO, ">./tmp/keywords.html") or die; # The htmlized keywords
open(AI, "<./tmp/file.html") or die; # The original HTML document
open(AO, ">./tmp/fileKeywords.html" ) or die; # The bold/tagged HTML document
@keywords = <KI>; # grab all the keywords
chomp @keywords; chomp @keywords; # Remove linefeeds
# Make sure keywords are unique. I assume only 1 kw per document is needed
@keywords = grep { !$seen{$_}++ } @keywords;
print KO<<EOF; # This is the start of the keywords.html doc
<HTML>
<HEAD>
<style type="text/css">
A {text-decoration:none}
</style>
</head>
<title>This is the Keywords Document</title>
EOF
undef $/; # turn of line-at-a-time processing, and suck up whole files
# Assumption: You have enough RAM to load in fileKeywords.html into memory.
($head,$_) = split /<BODY/i, <AI>; # read in HTML.
# Strip off everything before body tag, since we can't manipulate it
foreach $k (@keywords)
{
$k =~ s/\s//g; # No whitespace allowed in keyword (otherwise, need to
# mess around with the link -- it can't have spaces.)
print KO "<A HREF='http://jbh3-1.csci.csusb.edu/public/grad/sdesar/tmp/fileKeywords.html#$k' target=defsbox>$k</A><BR>\ n"; # add outbound link
s!$k!<A NAME='$k'><B>$k</B></A>!; # Create inbound link
# I assume that none of the keywords are subsets of the other keywords..
}
print AO "$head<BODY$_";
I think the problem is with above routine... CREATE ANCHORS to the KEYWORDS... the file - keywords.html is NOT being updated...
Currently I have to HIT the REFRESH BUTTON everytime I enter a new URL and convert it to ASCII....
How can I refresh the files without having to hit REFRESH?
Here' the code.. url.cgi
#!/usr/bin/perl
##-I/web/public/grad/sdesa
##########################
#This script does the following
#1. file.txt- converts html to ascii
#2. file.html- fetch the html file.
#3. fileParse.txt - Parse the articles, prepositions, etc. to create the keywords to be analyzed.
#4. fileHeader.html- gets the headers from file.html and creates anchors.
#5. stop_words- List of words which are to be parsed.... a, the, and, etc.,
# I got this from http://www.nzdl.org
#
#This file contains the following Routines
#
#1. convert to plain_text
#2. copy HTML file
#3. parse data
#4. get all the headers and place it after the <BODY> tag and make them anchors
#Here's the list of INPUT AND OUTPUT FILES-
#file.txt - ASCII file
#file.html - HTML file
#fileParse.txt- parses articles, prepositions etc
# Input file - file.txt, stop_words
# Output file - fileParse.txt
#fileHeader.html - Gets the Headers from HTML document
# Input file - file.html
# Output file - fileHeader.html
#fileKeys.txt - Displays the 5 most freq. words per parah
# and 10 most freq. words in the entire document.
# Input - fileParse.txt
# Output - fileKeys.txt
#fileKW.html - Displays the 10 mmost frew. words as BOLD in the Parsed file.
# Input - fileParse.txt
# Output - fileKW.html
#keywords.out - Parses the keywords from fileKeys.txt
# Input- fileKeys.txt
# Output- keywords.out
#fileBold.html - Bolds all the words.
# Input- file.html
# Input- keywords.out
# Output- fileBold.html
#find.html - Independent javascript to find keywords.
#
##########################
use LWP::Simple;
use HTML::Parser;
use CGI;
# require "boldparse.pl";
#require "wrapper.cgi";
my $cgi = new CGI;
my $url = $cgi->param('url');
my $HTTP_ROOT = "/web/public/grad/sdesar/t
my $HTTP_ROOT1 = "/web/public/grad/sdesar/t
my $HTTP_ROOT2 = "/web/public/grad/sdesar/t
if ( $url ne "" )
{
$content = fetch( $url );
if ( $content ne "" )
{
print $cgi->header( -type => 'text/plain' );
my $plain_text = plain_text($content);
print $plain_text;
print $content;
my $plain_text = plain_text( $content );
print $plain_text, "\n";
cp_to_file( $plain_text, "$HTTP_ROOT" );
print $cgi->redirect( "./tmp/file.txt" );
##########routine to parse data
print $cgi->header( -type => 'text/plain' );
open KW, 'stop_words';
@kw = map {chop;$_} <KW>;
close KW;
#form RE
$re = join '\b)|(\b','(\b', @kw,'\b)';
open TXT,'./tmp/file.txt';
while(<TXT>){
s/$re//goi;
$contentParse .= $_;
}
print $contentParse, "\n";
cp_to_file($contentParse, "$HTTP_ROOT2");
print $cgi->redirect( "./tmp/fileParse.txt" );
#print "\n\n\n";
##########copy the HTML
print $cgi->header( -type => 'text/plain' );
my $cp_to_file_html = cp_to_file_html($content);
print $cp_to_file_html;
print $content;
my $cp_to_file_html = cp_to_file_html( $content );
print $cp_to_file_html, "\n";
cp_to_file_html( $cp_to_file_html, "$HTTP_ROOT1" );
print $cgi->redirect( "./tmp/file.html" );
###############COPY TO KEYWORDS.out
print $cgi->header( -type => 'text/plain' );
$status=&parsenwrite("./tm
print $cgi->redirect( "./tmp/keywords.out" );
print $cgi->header( -type => 'text/plain' );
#print "Content-type: text/html\n\n";
if ($status){ ## The Parse'n Write sub-routine was fine
## Now read the html file and bold the keywords
&makebold("./tmp/file.html
print $cgi->redirect( "./tmp/fileBold.html" );
}else{
print "Error during parsewrite\n";
}
}
else
{
output_form( "Could not load URL: $url<br>" );
}
}
else
{
output_form( "Enter URL to fetch" );
}
sub output_form
{
my $msg = shift;
# output the html header
print $cgi->header( -type => 'text/html' );
# print the message if there is one
print "$msg<br>\n";
# output the form for the user
print $cgi->start_html;
print $cgi->start_form;
print "Please enter another URL: ";
print $cgi->textfield( -name=>'url', -value=>'http://www.' );
#print $cgi->textfield('url');
print $cgi->br;
print $cgi->submit( -label => 'Fetch' );
print $cgi->end_form;
print $cgi->end_html;
}
print <<"PrintTag";
<html><head>
<title>CGI-Generated HTML</title>
</head><body>
<H2 align="center">WEB TEXTURIZER</H2>
<HR>
<H2> The following files will be created: <H2>
<H3> Please hit RELOAD to REFRESH these files. <H3>
<UL>
<LI><A HREF="./tmp/file.txt"
TARGET="results">
Text Only Version</A>
<LI><A HREF="./tmp/file.html"
TARGET="results">
HTML Version</A>
<LI><A HREF="./tmp/fileKW.html"
TARGET="results">
KeyWords and BOLD Them-in Parsed File-- Do it in HTML-Headers</A>
<LI><A HREF="./tmp/fileParse.txt"
TARGET="results">
Parsed Version</A>
<LI><A HREF="./tmp/fileHeader.htm
TARGET="results">
HTML along with the Headers that have anchors created </A>
<LI><A HREF="./tmp/find.html"
TARGET="results">
HTML and find the keywords -- JavaScript</A>
<LI><A HREF="./tmp/fileKeys.txt"
TARGET="results">
Finds the most frequent words per Paragraph and Total</A>
<LI><A HREF="./tmp/keywords.out"
TARGET="results">
Keywords in ASCII</A>
<LI><A HREF="./tmp/keywords.html"
TARGET="results">
Keywords in HTML with anchors created Version</A>
<LI><A HREF="./tmp/fileKeywords.h
TARGET="results">
Finds Keywords and creates the anchorsin BOLD HTML Version</A>
<LI><A HREF="/public/grad/sdesar/
Text Only Version - file.txt</A>
</UL>
<HR>
</body></html>
PrintTag
#Line above has the magic word that
#makes the browser stop printing
#End of program
#print "Content-type: text/plain \n\n";
#print "TEST";
#### subroutines
sub fetch {
my ($url) = @_;
my $cont;
$cont = get($url);
return $cont;
}
# copies text to file
sub cp_to_file {
my ($text, $to_file) = @_;
open(OUT, ">" . $to_file);
print OUT $text;
close(OUT);
}
# copies file at HTML to a file
sub cp_to_file_html {
my ($text, $to_file) = @_;
open(OUT, ">" . $to_file);
print OUT $text;
return $text;
close(OUT);
}
# converts html text into plain text; (simplistic approach)
sub plain_text {
my ($in_text) = @_;
my $plain;
($plain = $in_text) =~ s/<[^>]*>//gs;
return $plain;
}
##############Clear cache
use CGI;
$query=new CGI;
my $file_name=$query->param('
my $file_path="/web/public/gr
open(OUT,$file_path.$file_
print "Content-type: text/html\n\n" if $file_name!~ /\.txt/;
print "Content-type: text/plain\n\n" if $file_name=~ /\.txt/;
print "<meta http-equiv=\"Pragma\" content=\"no-cache\">
<meta http-equiv=\"expires\" content=\"0\">";
while(<OUT>){print $_;}
close(OUT);
##############SUBROUTINE to add anchors to headers
open(FILE, "./tmp/file.html");
# open(FILE, "$ARGV[0]");
# @File = <FILE>;
@File = <FILE>;
$html = join(" ", @File);
close (FILE);
#Match Headers
(@headers) =($html=~m!<H\d>\s*(.*?)\s
#Convert all headers into named anchors
$html =~ s!(<H\d>\s*)(.*?)(\s*</H\d
#Construct links to headers
foreach $header (@headers)
{
$links .= qq(\n<a href="#$header">$header</a
}
#Place links at top of page after <Body> tag
$html =~ s/(<body[^>]*>)/$1$links/i
print $html;
#Write out new document
cp_to_file_html($html, "./tmp/fileHeader.html");
print "\n\n\n";
#############SUBROUTINE FOR KEYWORDS
sub keywords{
my $file = shift;
open FILE,"<$file" or die "can't open $file : $!";
my %wc=();
my %seen = ();
my %top;
my @words;
my $paragraphs='';
local $/='';
my @paragraphs = <FILE>;
close FILE;
for( @paragraphs ){
while( /(\w['\w-]*)/g ){
$seen{lc $1}++;
}
}
@top{(sort {$seen{$b} <=> $seen{$a} } keys %seen)[0..9]} = ();
for( @paragraphs ){
%wc = ();
if( @words = grep {exists $top{lc $_} && !$wc{lc $_}++} /(\w['\w-]*)/g ){
for my $w ( @words ){ s/\b(\Q$w\E)\b/<b>$1<\/b>/
$paragraphs .= join ' ',"<h1>",@words,"</h1>:\n$
}
}
$paragraphs=~s/$/<br>/gm;
return $paragraphs;
}
$paragraphs=keywords('./tm
print "$_:$paragraphs";
open FILE2, ">./tmp/fileKW.html" or die "can't open fileKW because $!";
print FILE2 keywords("./tmp/fileParse.
close FILE2;
#########################R
open IN,"<./tmp/fileParse.txt" or die "can't open fileParse.txt:$!";
open OUT,">./tmp/fileKeys.txt" or die "can't open fileKeys.txt:$!";
{local $/='';
while( <IN> ){
%wc = ();
while( /(\w['\w-]*)/g ){
$seen{lc $1}++;
$wc{lc $1}++;
}
print OUT "paragraph $.\n";
for( (sort {$wc{$b} <=> $wc{$a} } keys %wc)[0..4] ){
print OUT "$_ : $wc{$_}\n";
}
}
}
print OUT "total\n";
for( (sort {$seen{$b} <=> $seen{$a} } keys %seen)[0..9] ){
printf OUT "%5d %s\n", $seen{$_}, $_;
}
################ROUTINE FOR BOLDFACE THE KEYWORDS IN HTML FILE
# require "boldparse.pl";
sub parsenwrite{
($filekeys,$keywords)=@_;
open (FILEKEYS,$filekeys) || die "can't open $filekeys: $!\n";
$ctr=0;
while ($line=<FILEKEYS>){
$ctr++;
chomp($line); ## Remove the \n char
## Ignore lines with paragraph 1, paragraph 2 etc & ...
## line having only a ":" in it.
next if $line=~ /^paragraph\s+\d+/ || $line=~ /^:$/;
## Check for lines which have white spaces followed by
## numbers and then have a word. Eg. 20 information
if ($line=~ /\s+\d+\s+(.*)/){
$keywords{$1}=1;
next; ## Go for the next line
}
## All remaining lines WILL have the foll format
## word : number
@tmp=split(/:/,$line);
if ($#tmp>0){ ## The line has the above format.
$tmp[0]=~ s/\s+//g; ## Squeeze out white spaces
$keywords{$tmp[0]}=1;
}
}
close(FILEKEYS);
## We are using an associative array to eliminate any
## duplicate keywords we might have in the input text file.
open (KEYWORDS,">$keywords") || die "can't open $keywords: $!\n";
foreach(sort keys %keywords){
print KEYWORDS $_,"\n"; ## Write to the keyword output file
print "\n\n\n";
}
close(KEYWORDS);
return 1;
}
#######################CRE
# This uses 4 files:
open(KI, "<./tmp/keywords.out") or die; # simple keywords, one per line
open(KO, ">./tmp/keywords.html") or die; # The htmlized keywords
open(AI, "<./tmp/file.html") or die; # The original HTML document
open(AO, ">./tmp/fileKeywords.html"
@keywords = <KI>; # grab all the keywords
chomp @keywords; chomp @keywords; # Remove linefeeds
# Make sure keywords are unique. I assume only 1 kw per document is needed
@keywords = grep { !$seen{$_}++ } @keywords;
print KO<<EOF; # This is the start of the keywords.html doc
<HTML>
<HEAD>
<style type="text/css">
A {text-decoration:none}
</style>
</head>
<title>This is the Keywords Document</title>
EOF
undef $/; # turn of line-at-a-time processing, and suck up whole files
# Assumption: You have enough RAM to load in fileKeywords.html into memory.
($head,$_) = split /<BODY/i, <AI>; # read in HTML.
# Strip off everything before body tag, since we can't manipulate it
foreach $k (@keywords)
{
$k =~ s/\s//g; # No whitespace allowed in keyword (otherwise, need to
# mess around with the link -- it can't have spaces.)
print KO "<A HREF='http://jbh3-1.csci.csusb.edu/public/grad/sdesar/tmp/fileKeywords.html#$k' target=defsbox>$k</A><BR>\
s!$k!<A NAME='$k'><B>$k</B></A>!; # Create inbound link
# I assume that none of the keywords are subsets of the other keywords..
}
print AO "$head<BODY$_";
I think the problem is with above routine... CREATE ANCHORS to the KEYWORDS... the file - keywords.html is NOT being updated...
ASKER
Oh .. good..
But I want the files to be refreshed in 1-5 seconds.
But I want the files to be refreshed in 1-5 seconds.
Change the "300" to "5"
ASKER
Where should I place this in the above
script?
<META HTTP-EQUIV="Refresh" CONTENT="5">
script?
<META HTTP-EQUIV="Refresh" CONTENT="5">
ok heres what you need to do.
first before your click the fetch button, clear your disk and memory cache.
alternately what you can do is... (assuming you are using Netscape)
1 - fetch a URL.
2 - when you get the results, move your mouse pointer over the hyperlink and right click.
select "Open in new window".
Now you will have 2 browser windows open. one with the Web texturizer interface and other
wil the actual file.
3 - now fetch another URL
4 - when you get the results, just go to the other page, keep the shift key pressed and
click on the reload icon of your browser.
if you see the contents of the page changing, that means the reload is fine, its just the cache that is giving you the problem.
first before your click the fetch button, clear your disk and memory cache.
alternately what you can do is... (assuming you are using Netscape)
1 - fetch a URL.
2 - when you get the results, move your mouse pointer over the hyperlink and right click.
select "Open in new window".
Now you will have 2 browser windows open. one with the Web texturizer interface and other
wil the actual file.
3 - now fetch another URL
4 - when you get the results, just go to the other page, keep the shift key pressed and
click on the reload icon of your browser.
if you see the contents of the page changing, that means the reload is fine, its just the cache that is giving you the problem.
Or put the tag:
<META HTTP-EQUIV="Refresh" CONTENT="5">
between the head tags like this:
<head>
<META HTTP-EQUIV="Refresh" CONTENT="5">
<title>Title</title>
</head>
<META HTTP-EQUIV="Refresh" CONTENT="5">
between the head tags like this:
<head>
<META HTTP-EQUIV="Refresh" CONTENT="5">
<title>Title</title>
</head>
ASKER
Thanks for the suggestions but nothing seems to work .... the data in my keywords.html file is still from the previous url ... eventhoush the rest of the files -- fileKeyword.html and keywords.txt are updated.
I am not sure why is that?
I am not sure why is that?
ASKER
Edited text of question.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
I tried your suggestion...Nothing seems to work... I think there maybe a Bug with the last part of the above script- ie create anchors to the keywords...
you can check out the behavior at http://jbh3-1.csci.csusb.edu/public/grad/sdesar/url_bold.cgi -- check the keywords ascii and keywords with anchors files.... once you enter 2 different urls.
and check out the behavior...
the files have different data... and they should have same keywords only difference is keywords.html has anchors created.
the script can be view at http://jbh3-1.csci.csusb.edu/public/grad/sdesar/url_bold.pl
awaiting a response
Thanks
you can check out the behavior at http://jbh3-1.csci.csusb.edu/public/grad/sdesar/url_bold.cgi -- check the keywords ascii and keywords with anchors files.... once you enter 2 different urls.
and check out the behavior...
the files have different data... and they should have same keywords only difference is keywords.html has anchors created.
the script can be view at http://jbh3-1.csci.csusb.edu/public/grad/sdesar/url_bold.pl
awaiting a response
Thanks
<META HTTP-EQUIV="Refresh" CONTENT="300">
This will do it every 300 seconds (5 min)
If this is not what you want try and explain a little more and I'll see if I can help...