How to Print the Parahs that contain keywords?


This works but how can use this these keywords to skip to the
parahs that contains these words....

Any suggestions?


/// OZO 's code


#!/usr/bin/perl -00 -n
                  %wc = ();
                  while( /(\w['\w-]*)/g ){
                      $seen{lc $1}++;
                      $wc{lc $1}++;
                  }
                  print "paragraph $.\n";
                  for( (sort {$wc{$b} <=> $wc{$a} } keys %wc)[0..4] ){
                       print "$_ : $wc{$_}\n";
                  }
                  END{
                    print "total\n";
                    for( (sort {$seen{$b} <=> $seen{$a} } keys %seen)[0..9] ){
                       printf "%5d %s\n", $seen{$_}, $_;
                    }
                  }
sdesarAsked:
Who is Participating?
 
ozoConnect With a Mentor Commented:
#!/usr/bin/perl -00 -n
%wc = ();
while( /(\w['\w-]*)/g ){
    $seen{lc $1}++;
    push @{$paragraphs{lc $1}},$_ unless $wc{lc $1}++;
}
END{
    for( (sort {$seen{$b} <=> $seen{$a} } keys %seen)[0..9] ){
         printf "$_:\n@{$paragraphs{$_}}\n";
    }
}
0
 
sdesarAuthor Commented:
Adjusted points to 20
0
 
sdesarAuthor Commented:
I want to make this into a subroutine so that I can call the subroutine within a CGI script.

The following does not output the keywords and paragraphs when I make it into a subrotuine.
It works very well when the file is run from the command prompt, when its not a subroutine.


#keywords.pl
#!/usr/bin/perl -00 -n

sub keywords{

my ($wc)=@_;
my $paragraphs;

%wc = ();
while( /(\w['\w-]*)/g ){
    $seen{lc $1}++;
    push @{$paragraphs{lc $1}},$_ unless $wc{lc $1}++;
}
END{
    for( (sort {$seen{$b} <=> $seen{$a} } keys %seen)[0..9] ){
       #  printf "$_:\n@{$paragraphs{$_}}\n";
    }
}

$paragraphs="\n@{$paragraphs{$_}}\n";
$paragraphs=~s/\b(\Q$_\E)\b/<b>$1<\/b>/gi;
$paragraphs=~s/$/<br>/gm;
print "$_:$paragraphs";


} #end sub keywords

#!/usr/bin/perl
#keywords.cgi
#get the keywords from a INFILE
#List the Keywords
#Make them Bold
#Print the words to OUTFILE

$WC=shift;
$parah=keywords($WC);
print "$_:$parah;


0
Take Control of Web Hosting For Your Clients

As a web developer or IT admin, successfully managing multiple client accounts can be challenging. In this webinar we will look at the tools provided by Media Temple and Plesk to make managing your clients’ hosting easier.

 
ozoCommented:
What are $parah and $WC when you call $parah=keywords($WC);?
0
 
sdesarAuthor Commented:
$parah is a variable defined for $paragraphs
$WC is defined for $wc.
I want to create the following code as a subroutine so i can call is within a CGI script ...

                      #!/usr/bin/perl -00 -n
                      %wc = ();
                      while( /(\w['\w-]*)/g ){
                          $seen{lc $1}++;
                          push @{$paragraphs{lc $1}},$_ unless $wc{lc $1}++;
                      }
                      END{
                          for( (sort {$seen{$b} <=> $seen{$a} } keys %seen)[0..9] ){
                             #  printf "$_:\n@{$paragraphs{$_}}\n";
                          }
                      }

                      $paragraphs="\n@{$paragraphs{$_}}\n";
                      $paragraphs=~s/\b(\Q$_\E)\b/<b>$1<\/b>/gi;
                      $paragraphs=~s/$/<br>/gm;
                      print "$_:$paragraphs";

PS...Thanks OZO for your help.
0
 
sdesarAuthor Commented:
How can I make this code into a subroutine and then call it from a CGI script?

Thanks a million!!
0
 
ozoCommented:
> $WC is defined for $wc
That doesn't help much if I don't know what $wc is for.
I'll assume it is the name of a file containing the paragraphs:

sub keywords{
    my $file = shift;
    open FILE,"<$file" or die "can't open $file : $!";
    local $/='';
    my %wc;
    my %seen = ();
    my %paragraphs=();
    my $paragraphs='';
    my $paragraph;
    while( <FILE> ){
        %wc=();
        while( /(\w['\w-]*)/g ){
            $seen{lc $1}++;
            push @{$paragraphs{lc $1}},$_ unless $wc{lc $1}++;
        }
    }
    for( (sort {$seen{$b} <=> $seen{$a} } keys %seen)[0..9] ){
        $paragraph =  "$_:\n@{$paragraphs{$_}}\n";
        $paragraph=~s/\b(\Q$_\E)\b/<b>$1<\/b>/gi;
        $paragraph=~s/$/<br>/gm;
        $paragraphs .= $paragraph;
    }
    return $paragraphs;
}
0
 
sdesarAuthor Commented:
How can I call this subroutine in my cgi script?
Is it like this
keywords(paragraphs);
print "$_:$paragraphs";

Since $paragraphs returns the keywords that are contained in that particular $paragraph

0
 
ozoCommented:
$paragraphs = keywords('file');
0
 
sdesarAuthor Commented:
Thanks a billion OZO !!

now how do I detect this--
If there  3 out of 5 words that occur frequently in
certain paragraphs then how can I print those Parahs rather than having it print these again and again thus repeating the Parahs.

For example: if  these words
artificial
intelligence

occurs frequently in parah1 and parah3
instead of printing these twice,
I want  these parahs to print once.


Thanks again.
Your help is greatly appreciated.
0
 
ozoCommented:
So you don't want the paragraphs listed separately by word?
0
 
sdesarAuthor Commented:
Hmmm..  No....tricky..

I want the keywords example-

k1 and k2 appears
in parah1 and parah2
then it should display parah1 and parah2 only
but not display it twice like right now

k1:
paraph1

parah2

k2:
parah1


parah2

The following shouls appear

k1, k2;
parah1 -  k1 this has k2

parah2- this also has k1 and the keyword k2.


bascically I want to increase the effeciency by user being able to scan through the socume quickly rather than reading the whole document.
therefore, display only the parahs that contain Keywords.
Hope this helps!!
0
 
ozoCommented:
# something like this?
sub keywords{
    my $file = shift;
    open FILE,"<$file" or die "can't open $file : $!";
    my %wc=();
    my %seen = ();
    my %top;
    my @words;
    my $paragraphs='';
    local $/='';
    my @paragraphs = <FILE>;
    close FILE;
    for( @paragraphs ){
        while( /(\w['\w-]*)/g ){
            $seen{lc $1}++;
        }
    }
    @top{(sort {$seen{$b} <=> $seen{$a} } keys %seen)[0..9]} = ();
    for( @paragraphs ){
        %wc = ();
        if( @words = grep {exists $top{lc $_} && !$wc{lc $_}++} /(\w['\w-]*)/g \){
            for my $w ( @words ){ s/\b(\Q$w\E)\b/<b>$1<\/b>/gi; }
            $paragraphs .= join ' ',"<h1>",@words,"</h1>:\n$_";
        }
    }
    $paragraphs=~s/$/<br>/gm;
    return $paragraphs;
}
0
 
sdesarAuthor Commented:
This is what I did with your code..could you please help me debug...its giving an error when I compile

case-space .txt is the file taat contains the
data ie text with the paragraphs.
cp_to_file sub_routine copies the data drom keyword subroutine with boldface text to a file -parah_keywords.txt

THE LINES that i am getting the error in are-
if( @words = grep {exists $top{lc $_} && !$wc{lc $_}++} /(\w['\w-]*)/g \){
                                 for my $w ( @words ){ s/\b(\Q$w\E)\b/<b>$1<\/b>/gi; }


heres the code --

#!/usr/bin/perl --00 -n

$paragraphs = keywords('case-space.txt');
print $paragraphs;

cp_to_file($paragraphs, "parah_keywords.txt");

#----------subroutines-----------

sub keywords{
                         my $file = shift;
                         open FILE,"<$file" or die "can't open $file : $!";
                         my %wc=();
                         my %seen = ();
                         my %top;
                         my @words;
                         my $paragraphs='';
                         local $/='';
                         my @paragraphs = <FILE>;
                         close FILE;
                         for( @paragraphs ){
                             while( /(\w['\w-]*)/g ){
                                 $seen{lc $1}++;
                             }
                         }
                         @top{(sort {$seen{$b} <=> $seen{$a} } keys %seen)[0..9]} = ();
                         for( @paragraphs ){
                             %wc = ();
                             if( @words = grep {exists $top{lc $_} && !$wc{lc $_}++} /(\w['\w-]*)/g \){
                                 for my $w ( @words ){ s/\b(\Q$w\E)\b/<b>$1<\/b>/gi; }
                                 $paragraphs .= join ' ',"<h1>",@words,"</h1>:\n$_";
                             }
                         }
                         $paragraphs=~s/$/<br>/gm;
                         return $paragraphs;
                     }


# copies text to file

sub cp_to_file {
    my ($text, $to_file) = @_;

    open(OUT, ">" . $to_file);
    print OUT $text;
    close(OUT);
}


Awaiting your reply...
Thanks
0
 
sdesarAuthor Commented:
OZO Could you please help?

Heres the code again I modified it a bit....

I am getting error in the following lines
Backslash found where operator expected at subr_keywords line 25, near "/(\w['\w-]*)/g \"
      (Missing operator before  \?)
syntax error at subr_keywords line 25, near "/(\w['\w-]*)/g \"
Can't use global $1 in "my" at subr_keywords line 26.

Heres the code for sub keywords...

#!/usr/bin/perl


#----------subroutines-----------

sub keywords{
                         my $file = shift;
                         open FILE,"<$file" or die "can't open $file : $!";
                         my %wc=();
                         my %seen = ();
                         my %top;
                         my @words;
                         my $paragraphs='';
                         local $/='';
                         my @paragraphs = <FILE>;
                         close FILE;
                         for( @paragraphs ){
                             while( /(\w['\w-]*)/g ){
                                 $seen{lc $1}++;
                             }
                         }
                         @top{(sort {$seen{$b} <=> $seen{$a} } keys %seen)[0..9]} = ();
                         for( @paragraphs ){
                             %wc = ();
                             if( @words = grep {exists $top{lc $_} && !$wc{lc $_}++} /(\w['\w-]*)/g \){
                                 for my $w ( @words ){ s/\b(\Q$w\E)\b/<b>$1<\/b>/gi; }
                                 $paragraphs .= join ' ',"<h1>",@words,"</h1>:\n$_";
                             }
                         }
                         $paragraphs=~s/$/<br>/gm;
                         return $paragraphs;
                     }


$paragraphs=keywords('fileparse.txt');
print "$_:$paragraphs";

open FILE2, ">fileKW.html" or die "can't open fileKW because $!";
print FILE2 keywords("fileparse.txt");
close FILE2;
0
 
ozoCommented:
Removing the \ after the g should eliminate the error on line 26
I see that my post had it too, sorry about the typo.
0
 
sdesarAuthor Commented:
Thanks OZO.... Thanks a million!!

Your suggestion worked as always...

Now there's a another problem.

I want the Keywords to be boldfaced but it should be BOLD in the ORIGINAL File.

1. The txt file is parsed of all the prepositions, articles etc, using  fileparse.pl  and saves it in fileparse.txt--- this is the text file

2. Then I use the above routine to find the KEYOWRDS from the textfile -- in the above case fileparse.txt.   It also BOLDs the keywords and puts it in -- fileKW.html

THE PROBLEM---
I want the keywords to be BOLD in the original txt file ie test.out in this case... this file is in the routine below.


#!/usr/bin/perl
# FILE- file2parse.pl
open KW, 'stop_words';
@kw = map {chop;$_} <KW>;
close KW;

#form RE

$re = join '\b)|(\b','(\b', @kw,'\b)';

open TXT,'test.out';
while(<TXT>){
  s/$re//goi;
  $content .= $_;
}

print $content;


cp_to_file($content, "fileparse.txt");
print "\n\n\n";


# copies text to file

sub cp_to_file {
    my ($text, $to_file) = @_;

    open(OUT, ">" . $to_file);
    print OUT $text;
    close(OUT);
}


Awaiting your response.....

Thanks
0
 
sdesarAuthor Commented:
Help Please!!
0
All Courses

From novice to tech pro — start learning today.