Solved: Given a list of words how do i count them in a text file?

I created a text file and a script that does what you want. For assigning your own file to @a, you can

$fn1 = "/your/file/name";
open FILE, "<$fn1" or die "$!\n";
@a = <FILE>;
close FILE;

$ more cnt_words.txt
=head2 How can I count the number of occurrences of a substring within a string?

There are a number of ways, with varying efficiency: If you want a
count of a certain single character (X) within a string, you can use the
C<tr///> function like so:

$string = "ThisXlineXhasXsomeXx'sXinXit";
$count = ($string =~ tr/X//);
print "There are $count X charcters in the string";

This is fine if you are just looking for a single character. However,
if you are trying to count multiple character substrings within a
larger string, C<tr///> won't work. What you can do is wrap a while()
loop around a global pattern match. For example, let's count negative
integers:

$string = "-9 55 48 -2 23 -76 4 14 -44";
while ($string =~ /-\d+/g) { $count++ }
print "There are $count negative numbers in the string";

=head1 Found in /usr/local/lib/perl5/5.00503/pod/perlfaq5.pod

=head2 How do I count the number of lines in a file?

One fairly efficient way is to count newlines in the file. The
following program uses a feature of tr///, as documented in L<perlop>.
If your text file doesn't end with a newline, then it's not really a
proper text file, so this may report one fewer line than you expect.

$lines = 0;
open(FILE, $filename) or die "Can't open `$filename': $!";
while (sysread FILE, $buffer, 4096) {
$lines += ($buffer =~ tr/\n//);
}
close FILE;

This assumes no funny games with newline translations.

$ more cnt_words.pl
#!/usr/local/bin/perl
# file name cnt_words.pl

@a = split /,/, "are,end,text,work,for,with,game,count,pattern,number";
# you can open your first file to get the content into @a

$fn2 = "cnt_words.txt";

open WD, "<$fn2" or die "$!\n";
@b = <WD>;
close WD;

$p = 1; # paragram counter
%R =();
foreach $i (@b) { # loop through each line
if ($i =~ /^\n$/) { ++$p; }
foreach $j (@a) {
if ($i =~ /$j/) { ++$R{$j}[$p]; } else { $R{$j}[$p] += 0; }
}
}

for $i (sort keys %R) {
$t = "";
for $j (0..$#{$R{$i}}) { $t .= " $R{$i}[$j]"; }
printf "%10s %-30s\n", $i, $t;
}

$ ./cnt_words.pl
are 0 1 1 2 1 0 0 0 0 0 0
count 1 1 2 2 2 0 1 1 0 0 0
end 0 0 0 0 0 0 0 1 0 0 0
for 0 0 0 1 0 0 0 0 0 0 0
game 0 0 0 0 0 0 0 0 0 1 0
number 1 1 0 0 1 0 1 0 0 0 0
pattern 0 0 0 1 0 0 0 0 0 0 0
text 0 0 0 0 0 0 0 2 0 0 0
with 1 2 0 1 0 0 0 1 0 1 0
work 0 0 0 1 0 0 0 0 0 0 0

sdesar

ASKER

Instead of this line how can I get the llist of words in the text file --

@a = split /,/, "are,end,text,work,for,with,game,count,pattern,number";

And the file cnt_words.txt is that the word document (WD) that cointains a bunch of text with diff. pargraphs that have the above words -
are, end, text etc.....

thanks

hope to hear from you soon....

geotiger

"Instead of this line how can I get the llist of words in the text file --

@a = split /,/, "are,end,text,work,for,with,game,count,pattern,number";
"

Assuming you have the words one in a line, then here is how:

$fn="/dir/to/my/file/name";

open FILE, "<$fn" or die "Could not open the file - $fn:$!|n";
@a=<FILE>;
close FILE;

"And the file cnt_words.txt is that the word document (WD) that cointains a bunch of text with diff. pargraphs that have the above words -
are, end, text etc..... "

That is right. You put your source text in the $fn2 (cnt_words.txt).

sdesar

ASKER

This is what I did ...
But when I run this on the command Prompt -

$perl cnt_words.pl
No such file or directory

that's the message I am getting

no such file or directory... & I do see this file in my directory.
Also .. I changed dthe permissions of this file to be
chmod 755 cnt_words.pl

here's the file- cnt_words.pl

#!/usr/bin/perl
rds.pl.swp
# file name cnt_words.pl
$fn1="P1fileParse1.txt"; // input file that has the text data
open FILE, "<$fn1" or die "\$!\n";
@a=<FILE>;
close FILE;

# @a = split /,/, "are,end,text,work,for,with,game,count,pattern,number";
# you can open your first file to get the content into @a

$fn2 = "cnt_words.txt";

open WD, "<$fn2" or die "$!\n";
@b = <WD>;
close WD;

$p = 1; # paragram counter
%R =();
foreach $i (@b) { # loop through each line
if ($i =~ /^\n$/) { ++$p; }
foreach $j (@a) {
if ($i =~ /$j/) { ++$R{$j}[$p]; } else { $R{$j}[$p]
+= 0; }
}
}

for $i (sort keys %R) {
$t = "";
for $j (0..$#{$R{$i}}) { $t .= " $R{$i}[$j]"; }
printf "%10s %-30s\n", $i, $t;
}

Could you please help me debug this really easy but yet a mystery.... code...

Awating are response

sdesar

ASKER

This is what I did ...
But when I run this on the command Prompt -

$perl cnt_words.pl
No such file or directory

that's the message I am getting

no such file or directory... & I do see this file in my directory.
Also .. I changed dthe permissions of this file to be
chmod 755 cnt_words.pl

here's the file- cnt_words.pl

#!/usr/bin/perl
rds.pl.swp
# file name cnt_words.pl
$fn1="P1fileParse1.txt"; // input file that has the text data
open FILE, "<$fn1" or die "\$!\n";
@a=<FILE>;
close FILE;

# @a = split /,/, "are,end,text,work,for,with,game,count,pattern,number";
# you can open your first file to get the content into @a

$fn2 = "cnt_words.txt";

open WD, "<$fn2" or die "$!\n";
@b = <WD>;
close WD;

$p = 1; # paragram counter
%R =();
foreach $i (@b) { # loop through each line
if ($i =~ /^\n$/) { ++$p; }
foreach $j (@a) {
if ($i =~ /$j/) { ++$R{$j}[$p]; } else { $R{$j}[$p]
+= 0; }
}
}

for $i (sort keys %R) {
$t = "";
for $j (0..$#{$R{$i}}) { $t .= " $R{$i}[$j]"; }
printf "%10s %-30s\n", $i, $t;
}

Could you please help me debug this really easy but yet a mystery.... code...

Awating are response

sdesar

ASKER

I changed the above script o have the following ....

$fn1="cnt_words.out"; # output file

$fn2 = "cnt_words.txt"; #input WD word document file

typing perl cnt_words.pl

but there's no output in cnt_words.out

I don't understand ... ?

Could you please give sugestions...

geotiger

You need to use a "./" in front of the command after you cd to the directory, i.e.,

cd /my/dir/has/cnt_words.pl

../cnt_words.pl

What is "rds.pl.swp" in your code?

The $fn1 should be your input file for a list of key words to be searched in $fn2. If you want to have output to a file, you need to add the following codes to the end:

$fn3 = "myoutputfile.out";
open OUT, ">$fn3" or die "Could not write to file - $fn3:$!\n";

for $i (sort keys %R) {
$t = "";
for $j (0..$#{$R{$i}}) { $t .= " $R{$i}[$j]"; }
printf OUT "%10s %-30s\n", $i, $t;
}
close OUT;

sdesar

ASKER

Here are the files and the results... I can't figure out why I am getting 0s....

#!/usr/bin/perl
# file name cnt_words.pl
$fn1="cnt_keywords.out"; # keywords file
open FILE, "<$fn1" or die "could not open the file -$!|\n";
@a=<FILE>;
close FILE;

# @a = split /,/, "are,end,text,work,for,with,game,count,pattern,number";
# you can open your first file to get the content into @a

$fn2 = "cnt_words1.txt"; #input word document WD file

open WD, "<$fn2" or die "$!\n";
@b = <WD>;
close WD;

$p = 1; # paragram counter
%R =();
foreach $i (@b) { # loop through each line
if ($i =~ /^\n$/) { ++$p; }
foreach $j (@a) {
if ($i =~ /$j/) { ++$R{$j}[$p]; } else { $R{$j}[$p]
+= 0; }
}
}

$fn3 = "cnt_words.out";
open OUT, ">$fn3" or die "Could not write to file - $fn3:$!\n";

for $i (sort keys %R) {
$t = "";
for $j (0..$#{$R{$i}}) { $t .= " $R{$i}[$j]"; }
printf OUT "%10s %-30s\n", $i, $t;
}
close OUT;

This is the words document -
cnt_words1.txt
This a a test file check it out and I hope that this works finally and the work,
for, with, game,count,pattern,number are in it.
and ths has some words, test , sentences..

This is the Keywords file -
cnt_keywords.out
test
game
count
the
is
a
seems
there
check
works
hope
finally

This is the output/result file -
cnt_words.out
a
0 0
check
0 0
count
0 0
finally
0 0
game
0 0
hope
0 0
is
0 0
seems
0 0
test
0 0
the
0 0
there
0 0
works

This output file has all 0s...
Do you have any suggestions to fix this?

Thanks

geotiger

The reason was because the "\n" character in the end of each key words. I re-wrote the code to read the key words into @a. It works as expected. Here are the files and results:

$ more cnt_keys.txt
are
end
text
work
for
with
game
count
pattern
number

$ more cnt_words.pl
#!/usr/local/bin/perl
# file name cnt_words.pl

# @a=split /,/, "are,end,text,work,for,with,game,count,pattern,number";
# you can open your first file to get the content into @a

$fn1 = "cnt_keys.txt";
$fn2 = "cnt_words.txt";
$fn3 = "cnt_out.txt";
open FILE, "<$fn1" or die "$!\n";
while (<FILE>) {
chomp;
next if (!$_);
push @a, $_;
}
close FILE;

open WD, "<$fn2" or die "$!\n";
@b = <WD>;
close WD;

$p = 1; # paragram counter
%R =();
foreach $i (@b) { # loop through each line
if ($i =~ /^\n$/) { ++$p; }
foreach $j (@a) {
if ($i =~ /$j/) { ++$R{$j}[$p]; } else { $R{$j}[$p] += 0; }
}
}

for $i (sort keys %R) {
$t = "";
for $j (0..$#{$R{$i}}) { $t .= " $R{$i}[$j]"; }
printf "%10s %-30s\n", $i, $t;
}

open OUT, ">$fn3" or die "could not write to $fn3:$!\n";
for $i (sort keys %R) {
$t = "";
for $j (0..$#{$R{$i}}) { $t .= " $R{$i}[$j]"; }
printf OUT "%10s %-30s\n", $i, $t;
}
close OUT;

$ ./cnt_words.pl
are 0 1 1 2 1 0 0 0 0 0 0
count 1 1 2 2 2 0 1 1 0 0 0
end 0 0 0 0 0 0 0 1 0 0 0
for 0 0 0 1 0 0 0 0 0 0 0
game 0 0 0 0 0 0 0 0 0 1 0
number 1 1 0 0 1 0 1 0 0 0 0
pattern 0 0 0 1 0 0 0 0 0 0 0
text 0 0 0 0 0 0 0 2 0 0 0
with 1 2 0 1 0 0 0 1 0 1 0
work 0 0 0 1 0 0 0 0 0 0 0
$ more cnt_out.txt
are 0 1 1 2 1 0 0 0 0 0 0
count 1 1 2 2 2 0 1 1 0 0 0
end 0 0 0 0 0 0 0 1 0 0 0
for 0 0 0 1 0 0 0 0 0 0 0
game 0 0 0 0 0 0 0 0 0 1 0
number 1 1 0 0 1 0 1 0 0 0 0
pattern 0 0 0 1 0 0 0 0 0 0 0
text 0 0 0 0 0 0 0 2 0 0 0
with 1 2 0 1 0 0 0 1 0 1 0
work 0 0 0 1 0 0 0 0 0 0 0

sdesar

ASKER

Thanks.. its works... But how can I place the paragraph numbers on the top..

P1 P2 P3.... Pn
are 0 1 1 2 1 0
count 1 1 2 2 2 0
end 0 0 0 0 0 0

Thanks a million....

geotiger

Just use the following codes for output:

$t = " ";
for $j (0..$#{$R{$i}}) { $t .= sprintf " P%02d", $j; }
print "$t\n";

for $i (sort keys %R) {
$t = "";
for $j (0..$#{$R{$i}}) { $t .= sprintf " %3d", $R{$i}[$j]; }
printf "%10s %-30s\n", $i, $t;
}

open OUT, ">$fn3" or die "could not write to $fn3:$!\n";
$t = " ";
for $j (0..$#{$R{$i}}) { $t .= sprintf " P%02d", $j; }
print OUT "$t\n";

for $i (sort keys %R) {
$t = "";
for $j (0..$#{$R{$i}}) { $t .= sprintf " %3d", $R{$i}[$j]; }
printf OUT "%10s %-30s\n", $i, $t;
}
close OUT;

sdesar

ASKER

Thanks Geotiger....
Also.. since I have paragraphs in my code there are 2 extra lines in the cnt_words.txt. How can have just one line instead of 2 lines because the extra line is treated as a parah and therefore it has all 0s. And also I need to display the para. numbers
P1 P2 P3 ..... Pn

P1 P2 P3 P4 ....
list 0 0 2 3
of 0 4 4 0
keywords 0 6 4 2
in
the

Thanks

sdesar

ASKER

Here's what the text file - cnt_words.txt looks like ( notice its got 2 lines after each parah.) - How can I have the code ignore one of the line so that the 0s don't appear as seen aboveor is there another way to avoid it?)

artificial intelligence direct application problems
immediate outside ai community. example,
project (skipper) research group development
intelligent agents web elements, informational needs tastes user.

skipper project distinct ways
ongoing research efforts area intelligent web-oriented
agents. user profiles used customize form
content -line information manner meets
specific informational needs web-browsing individual. sets
skipper apart similarly minded tools fact skipper
sit background web-browser extract user profiles
manner unobtrusive, .e., requires minimal explicit
statements feedback user. unobtrusive tools

ozo

What line do you want to ignore?

sdesar

ASKER

It works!!
Thanks geotiger!!

sdesar

ASKER

Comment accepted as answer

sdesar

ASKER

Thanks Again!!