Solved

Given a list of words how do i count them in a text file?

Posted on 2000-05-15
18
190 Views
Last Modified: 2010-03-05
How can I count the list of words?
File1 - contains a list of words
testing
automate
discrete
measure
seem
..
File2 - text file that contins these words in different paragraphs.

File3- Should output the following
           P1  P2  P3....Pn ..paragraphs

testing    1    0   1
automate   5    0   4
discrete   3    3   3
measure    3    5   5
seem       2    1   0


How can I achieve this words count?
0
Comment
Question by:sdesar
  • 11
  • 6
18 Comments
 
LVL 12

Expert Comment

by:geotiger
ID: 2815608
I created a text file and a script that does what you want. For assigning your own file to @a, you can

$fn1 = "/your/file/name";
open FILE, "<$fn1" or die "$!\n";
@a = <FILE>;
close FILE;
 

$ more cnt_words.txt
=head2 How can I count the number of occurrences of a substring within a string?

There are a number of ways, with varying efficiency: If you want a
count of a certain single character (X) within a string, you can use the
C<tr///> function like so:

    $string = "ThisXlineXhasXsomeXx'sXinXit";
    $count = ($string =~ tr/X//);
    print "There are $count X charcters in the string";

This is fine if you are just looking for a single character.  However,
if you are trying to count multiple character substrings within a
larger string, C<tr///> won't work.  What you can do is wrap a while()
loop around a global pattern match.  For example, let's count negative
integers:

    $string = "-9 55 48 -2 23 -76 4 14 -44";
    while ($string =~ /-\d+/g) { $count++ }
    print "There are $count negative numbers in the string";

=head1 Found in /usr/local/lib/perl5/5.00503/pod/perlfaq5.pod

=head2 How do I count the number of lines in a file?

One fairly efficient way is to count newlines in the file. The
following program uses a feature of tr///, as documented in L<perlop>.
If your text file doesn't end with a newline, then it's not really a
proper text file, so this may report one fewer line than you expect.

    $lines = 0;
    open(FILE, $filename) or die "Can't open `$filename': $!";
    while (sysread FILE, $buffer, 4096) {
        $lines += ($buffer =~ tr/\n//);
    }
    close FILE;

This assumes no funny games with newline translations.

$ more cnt_words.pl
#!/usr/local/bin/perl
# file name cnt_words.pl


@a = split /,/, "are,end,text,work,for,with,game,count,pattern,number";
# you can open your first file to get the content into @a

$fn2 = "cnt_words.txt";

open WD, "<$fn2" or die "$!\n";
@b = <WD>;
close WD;

$p = 1;    # paragram counter
%R =();
foreach $i (@b) {    # loop through each line
    if ($i =~ /^\n$/) {  ++$p; }
    foreach $j (@a) {  
        if ($i =~  /$j/) { ++$R{$j}[$p]; } else { $R{$j}[$p] += 0; }
    }
}

for $i (sort keys %R) {
    $t = "";
    for $j (0..$#{$R{$i}}) { $t .= " $R{$i}[$j]"; }
    printf "%10s %-30s\n", $i, $t;
}

$ ./cnt_words.pl
       are   0 1 1 2 1 0 0 0 0 0 0      
     count   1 1 2 2 2 0 1 1 0 0 0      
       end   0 0 0 0 0 0 0 1 0 0 0      
       for   0 0 0 1 0 0 0 0 0 0 0      
      game   0 0 0 0 0 0 0 0 0 1 0      
    number   1 1 0 0 1 0 1 0 0 0 0      
   pattern   0 0 0 1 0 0 0 0 0 0 0      
      text   0 0 0 0 0 0 0 2 0 0 0      
      with   1 2 0 1 0 0 0 1 0 1 0      
      work   0 0 0 1 0 0 0 0 0 0 0      
0
 

Author Comment

by:sdesar
ID: 2829432
Instead of  this line how can I get the llist of words in the text file --

@a = split /,/, "are,end,text,work,for,with,game,count,pattern,number";

And the file cnt_words.txt is that the word document (WD)  that cointains a bunch of text with diff. pargraphs that have the above words -
are, end, text etc.....


thanks

hope to hear from you soon....
0
 
LVL 12

Expert Comment

by:geotiger
ID: 2830782
"Instead of  this line how can I get the llist of words in the text file --

@a = split /,/, "are,end,text,work,for,with,game,count,pattern,number";
"

Assuming you have the words one in a line, then here is how:

$fn="/dir/to/my/file/name";

open FILE, "<$fn" or die "Could not open the file - $fn:$!|n";
@a=<FILE>;
close FILE;

"And the file cnt_words.txt is that the word document (WD)  that cointains a bunch of text with diff. pargraphs that have the above words -
are, end, text etc..... "

That is right. You put your source text in the $fn2 (cnt_words.txt).

0
 

Author Comment

by:sdesar
ID: 2831118
This is what I did ...
But when I run this on the command Prompt -

$perl cnt_words.pl
No such file or directory

that's the message I am getting

no such file or directory... & I do see this file in my directory.
Also .. I changed dthe permissions of this file to be
chmod 755 cnt_words.pl


here's the file-   cnt_words.pl

#!/usr/bin/perl
rds.pl.swp
 # file name cnt_words.pl
$fn1="P1fileParse1.txt"; // input file that has the text data
open FILE, "<$fn1" or die "\$!\n";
@a=<FILE>;
close FILE;

 #   @a = split /,/, "are,end,text,work,for,with,game,count,pattern,number";
                    # you can open your first file to get the content into @a

                    $fn2 = "cnt_words.txt";

                    open WD, "<$fn2" or die "$!\n";
                    @b = <WD>;
                    close WD;

                    $p = 1;    # paragram counter
                    %R =();
                    foreach $i (@b) {    # loop through each line
 if ($i =~ /^\n$/) {  ++$p; }
                        foreach $j (@a) {
                            if ($i =~  /$j/) { ++$R{$j}[$p]; } else { $R{$j}[$p]
 += 0; }
                        }
                    }

                    for $i (sort keys %R) {
                        $t = "";
                        for $j (0..$#{$R{$i}}) { $t .= " $R{$i}[$j]"; }
                        printf "%10s %-30s\n", $i, $t;
                    }



Could you please help me debug this really easy but yet a mystery.... code...

Awating are response


0
 

Author Comment

by:sdesar
ID: 2831122
This is what I did ...
But when I run this on the command Prompt -

$perl cnt_words.pl
No such file or directory

that's the message I am getting

no such file or directory... & I do see this file in my directory.
Also .. I changed dthe permissions of this file to be
chmod 755 cnt_words.pl


here's the file-   cnt_words.pl

#!/usr/bin/perl
rds.pl.swp
 # file name cnt_words.pl
$fn1="P1fileParse1.txt"; // input file that has the text data
open FILE, "<$fn1" or die "\$!\n";
@a=<FILE>;
close FILE;

 #   @a = split /,/, "are,end,text,work,for,with,game,count,pattern,number";
                    # you can open your first file to get the content into @a

                    $fn2 = "cnt_words.txt";

                    open WD, "<$fn2" or die "$!\n";
                    @b = <WD>;
                    close WD;

                    $p = 1;    # paragram counter
                    %R =();
                    foreach $i (@b) {    # loop through each line
 if ($i =~ /^\n$/) {  ++$p; }
                        foreach $j (@a) {
                            if ($i =~  /$j/) { ++$R{$j}[$p]; } else { $R{$j}[$p]
 += 0; }
                        }
                    }

                    for $i (sort keys %R) {
                        $t = "";
                        for $j (0..$#{$R{$i}}) { $t .= " $R{$i}[$j]"; }
                        printf "%10s %-30s\n", $i, $t;
                    }



Could you please help me debug this really easy but yet a mystery.... code...

Awating are response


0
 

Author Comment

by:sdesar
ID: 2831162
I changed the above script o have the following ....

$fn1="cnt_words.out";     # output file

$fn2 = "cnt_words.txt";   #input WD word document file


typing perl cnt_words.pl

but there's no output in cnt_words.out


I don't understand ... ?

Could you please give sugestions...
0
 
LVL 12

Expert Comment

by:geotiger
ID: 2834288
You need to use a "./" in front of the command after you cd to the directory, i.e.,

cd /my/dir/has/cnt_words.pl

../cnt_words.pl

What is "rds.pl.swp" in your code?
 

The $fn1 should be your input file for a list of key words to be searched in $fn2. If you want to have output to a file, you need to add the following codes to the end:

$fn3 = "myoutputfile.out";
open OUT, ">$fn3" or die "Could not write to file - $fn3:$!\n";

for $i (sort keys %R) {
  $t = "";
  for $j (0..$#{$R{$i}}) { $t .= " $R{$i}[$j]"; }
  printf OUT "%10s %-30s\n", $i, $t;
}
close OUT;


0
 

Author Comment

by:sdesar
ID: 2837478
Here are the files and the results... I can't figure out why I am getting 0s....

#!/usr/bin/perl
# file name cnt_words.pl
$fn1="cnt_keywords.out";                # keywords  file
open FILE, "<$fn1" or die "could not open the file -$!|\n";
@a=<FILE>;
close FILE;

 #   @a = split /,/, "are,end,text,work,for,with,game,count,pattern,number";
                    # you can open your first file to get the content into @a

                    $fn2 = "cnt_words1.txt";   #input word document WD file

                    open WD, "<$fn2" or die "$!\n";
                    @b = <WD>;
                    close WD;


                    $p = 1;    # paragram counter
                    %R =();
 foreach $i (@b) {    # loop through each line
                        if ($i =~ /^\n$/) {  ++$p; }
                        foreach $j (@a) {
                            if ($i =~  /$j/) { ++$R{$j}[$p]; } else { $R{$j}[$p]
 += 0; }
                        }
                    }

$fn3 = "cnt_words.out";
open OUT, ">$fn3" or die "Could not write to file - $fn3:$!\n";

                    for $i (sort keys %R) {
                        $t = "";
                        for $j (0..$#{$R{$i}}) { $t .= " $R{$i}[$j]"; }
                        printf OUT "%10s %-30s\n", $i, $t;
                    }
close OUT;


This is the words document -
cnt_words1.txt
This a a test file check it out and I hope that this works finally and the work,
 for, with, game,count,pattern,number are in it.
and ths has some words, test , sentences..

This is the Keywords file -
cnt_keywords.out
test
game
count
the
is
a
seems
there
check
works
hope
finally


This is the output/result file -
cnt_words.out
a
   0 0
    check
   0 0
    count
   0 0
  finally
   0 0
     game
   0 0
     hope
   0 0
       is
   0 0
    seems
   0 0
     test
   0 0
      the
   0 0
    there
   0 0
    works


This output file has all 0s...
Do you have any suggestions to fix this?

Thanks



0
 
LVL 12

Expert Comment

by:geotiger
ID: 2837984
The reason was because the "\n" character in the end of each key words. I re-wrote the code to read the key words into @a. It works as expected. Here are the files and results:

$ more cnt_keys.txt
are
end
text
work
for
with
game
count
pattern
number

$ more cnt_words.pl
#!/usr/local/bin/perl
# file name cnt_words.pl

# @a=split /,/, "are,end,text,work,for,with,game,count,pattern,number";
# you can open your first file to get the content into @a

$fn1 = "cnt_keys.txt";
$fn2 = "cnt_words.txt";
$fn3 = "cnt_out.txt";
open FILE, "<$fn1" or die "$!\n";
while (<FILE>) {
  chomp;
  next if (!$_);
  push @a, $_;
}
close FILE;

open WD, "<$fn2" or die "$!\n";
@b = <WD>;
close WD;

$p = 1;    # paragram counter
%R =();
foreach $i (@b) {    # loop through each line
    if ($i =~ /^\n$/) {  ++$p; }
    foreach $j (@a) {  
        if ($i =~  /$j/) { ++$R{$j}[$p]; } else { $R{$j}[$p] += 0; }
    }
}

for $i (sort keys %R) {
    $t = "";
    for $j (0..$#{$R{$i}}) { $t .= " $R{$i}[$j]"; }
    printf "%10s %-30s\n", $i, $t;
}

open OUT, ">$fn3" or die "could not write to $fn3:$!\n";
for $i (sort keys %R) {
    $t = "";
    for $j (0..$#{$R{$i}}) { $t .= " $R{$i}[$j]"; }
    printf OUT "%10s %-30s\n", $i, $t;
}
close OUT;


$ ./cnt_words.pl
       are   0 1 1 2 1 0 0 0 0 0 0      
     count   1 1 2 2 2 0 1 1 0 0 0      
       end   0 0 0 0 0 0 0 1 0 0 0      
       for   0 0 0 1 0 0 0 0 0 0 0      
      game   0 0 0 0 0 0 0 0 0 1 0      
    number   1 1 0 0 1 0 1 0 0 0 0      
   pattern   0 0 0 1 0 0 0 0 0 0 0      
      text   0 0 0 0 0 0 0 2 0 0 0      
      with   1 2 0 1 0 0 0 1 0 1 0      
      work   0 0 0 1 0 0 0 0 0 0 0      
$ more cnt_out.txt
       are   0 1 1 2 1 0 0 0 0 0 0      
     count   1 1 2 2 2 0 1 1 0 0 0      
       end   0 0 0 0 0 0 0 1 0 0 0      
       for   0 0 0 1 0 0 0 0 0 0 0      
      game   0 0 0 0 0 0 0 0 0 1 0      
    number   1 1 0 0 1 0 1 0 0 0 0      
   pattern   0 0 0 1 0 0 0 0 0 0 0      
      text   0 0 0 0 0 0 0 2 0 0 0      
      with   1 2 0 1 0 0 0 1 0 1 0      
      work   0 0 0 1 0 0 0 0 0 0 0      
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 

Author Comment

by:sdesar
ID: 2839681
Thanks.. its works... But how can I place the paragraph numbers on the top..

      P1 P2 P3.... Pn
are   0  1  1  2  1  0
count 1  1  2  2  2  0
end   0  0  0  0  0  0

Thanks a million....




0
 
LVL 12

Expert Comment

by:geotiger
ID: 2839937
Just use the following codes for output:

$t = "           ";
for $j (0..$#{$R{$i}}) { $t .= sprintf " P%02d", $j; }
print "$t\n";

for $i (sort keys %R) {
    $t = "";
    for $j (0..$#{$R{$i}}) { $t .= sprintf " %3d", $R{$i}[$j]; }
    printf "%10s %-30s\n", $i, $t;
}

open OUT, ">$fn3" or die "could not write to $fn3:$!\n";
$t = "           ";
for $j (0..$#{$R{$i}}) { $t .= sprintf " P%02d", $j; }
print OUT "$t\n";

for $i (sort keys %R) {
    $t = "";
    for $j (0..$#{$R{$i}}) { $t .= sprintf " %3d", $R{$i}[$j]; }
    printf OUT "%10s %-30s\n", $i, $t;
}
close OUT;


0
 

Author Comment

by:sdesar
ID: 2841578
Thanks Geotiger....
Also.. since I have paragraphs in my code there are 2 extra lines in the cnt_words.txt.  How can have just one line instead of 2 lines because the extra line is treated as a parah and therefore it has all 0s. And also I need to display the para. numbers
P1   P2   P3 ..... Pn


         P1  P2    P3   P4   ....
list      0   0    2     3
of        0   4    4     0
keywords  0   6    4     2
in
the


Thanks

0
 

Author Comment

by:sdesar
ID: 2849974
Here's what the text file - cnt_words.txt looks like ( notice its got 2 lines after each parah.) - How can I have the code ignore one of the line so that the 0s don't appear as seen aboveor is there another way to avoid it?)

 artificial intelligence   direct application  problems
 immediate  outside   ai community.   example,
 project (skipper)   research group   development
 intelligent agents  web elements, informational needs  tastes   user.


 skipper project  distinct     ways
ongoing research efforts   area  intelligent web-oriented
agents.   user profiles   used  customize  form
 content  -line information   manner  meets
specific informational needs   web-browsing individual.  sets
skipper apart   similarly minded tools   fact  skipper
 sit   background   web-browser  extract user profiles
 manner    unobtrusive, .e., requires minimal explicit
statements    feedback   user. unobtrusive tools


0
 
LVL 84

Expert Comment

by:ozo
ID: 2851108
What line do you want to ignore?
0
 
LVL 12

Accepted Solution

by:
geotiger earned 50 total points
ID: 2851483
Use the following codes to get rid of empty lines between paragraphs.


$ cat cnt_words.pl
#!/usr/local/bin/perl
# file name cnt_words.pl

# @a=split /,/, "are,end,text,work,for,with,game,count,pattern,number";
# you can open your first file to get the content into @a

$fn1 = "cnt_keys.txt";
$fn2 = "cnt_words.txt";
$fn3 = "cnt_out.txt";
$fn4 = "cnt_out2.txt";
open FILE, "<$fn1" or die "$!\n";
while (<FILE>) {
  chomp;
  next if (!$_);
  push @a, $_;
}
close FILE;

open WD, "<$fn2" or die "$!\n";
@b = <WD>;
close WD;

$p = 1;    # paragram counter
%R =();
my $lastline="";
foreach $i (@b) {    # loop through each line
    foreach $j (@a) {  
        if ($i =~  /$j/) { ++$R{$j}[$p]; } else { $R{$j}[$p] += 0; }
    }
    if ($i =~ /^\n$/ && $lastline !~ /^\n$/ ) {  ++$p; }
    $lastline=$i;
}

for $i (sort keys %R) {
    $t = "";
    for $j (1..$#{$R{$i}}) { $t .= " $R{$i}[$j]"; }
    printf "%10s %-30s\n", $i, $t;
}

open OUT, ">$fn3" or die "could not write to $fn3:$!\n";
for $i (sort keys %R) {
    $t = "";
    for $j (1..$#{$R{$i}}) { $t .= " $R{$i}[$j]"; }
    printf OUT "%10s %-30s\n", $i, $t;
}
close OUT;


$t = "           ";
for $j (1..$#{$R{$a[1]}}) { $t .= sprintf " P%02d", $j; }
print "$t\n";

for $i (sort keys %R) {
    $t = "";
    for $j (1..$#{$R{$i}}) { $t .= sprintf " %3d", $R{$i}[$j]; }
    printf "%10s %-30s\n", $i, $t;
}

open OUT, ">$fn4" or die "could not write to $fn3:$!\n";
$t = "           ";
for $j (1..$#{$R{$i}}) { $t .= sprintf " P%02d", $j; }
print OUT "$t\n";

for $i (sort keys %R) {
    $t = "";
    for $j (0..$#{$R{$i}}) { $t .= sprintf " %3d", $R{$i}[$j]; }
    printf OUT "%10s %-30s\n", $i, $t;
}
close OUT;
0
 

Author Comment

by:sdesar
ID: 2851771
It works!!
Thanks geotiger!!
0
 

Author Comment

by:sdesar
ID: 2851775
Comment accepted as answer
0
 

Author Comment

by:sdesar
ID: 2851776
Thanks Again!!
0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Suggested Solutions

I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
This tutorial demonstrates a quick way of adding group price to multiple Magento products.

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now