asked on

counting words

I know how to count words and its frequency in a text file.
However, I don not know how to count the words and its frequency in different paragraphs.

Any suggestions greatly appreciated.
Thanks

ASKER CERTIFIED SOLUTION

monas

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

sdesar

ASKER

I tried this but I am receiving an error message as follow-
Unrecognized file test: -n at line 3

I made some assumptions-
I typed
#!/usr/bin/perl

per -00 -n -e '@.........."\n"list_of_files

I am assuming that the
list_of_files is a file that cointains the text.

I thougt that there was a semicolon missing after list_of_files;

However, that didn't solve it either.

Any other suggestions?

monas

NOOOOOOO!

If you like to write contens in file, then write to cnt.pl

#!/usr/bin/perl -00 -n
@w = /\w+/g;print $#w."\n";

and from command line call

cnt.pl file_1 file_2 file_3

where file_X is name of file with text where you want to count words in paragraph.

Good look

sdesar

ASKER

This seems to count the words only and not the words and its frequency in the individual parahs.

Anything else that I should do?

PS. Thanks for your time on these suggestions.

monas

Well, you have all the words in @w array. If you want frequency then add

map { $wc{$_}++; } @w;
foreach $wd(keys %wc){print $wd.':'.$wc{$wd}."\n";}
print "---------------\n";

This will additionaly print you every word and number of occurences of this word in the paragraph

ozo

What's a word?

sdesar

ASKER

I test this and it works.
How do I list the paragraph numbers-
ie Parah1
word freq
Parah2
word freq

monas

#!/usr/bin/perl -00 -n
print "Parah ".$..":";
@w = /\w+/g;print $#w."\n";
map { $wc{$_}++; } @w;
foreach $wd(keys %wc){print "\t".$wd."\t".$wc{$wd}."\n";}

sdesar

ASKER

Thanks monas!!
I gave U excellent points.
Have they been recorded?

monas

Yes, TNX

sdesar

ASKER

How can I use perl for
word recognition?

Example - If there are a bunch of words in a text file like -

this text is derived from the book and to see from information on deriving check out the textbook.

Since derived and deriving stem from the root - derive. How can I use perl to parse the text and recognize DERIVE.

ozo

use Lingua::Stem qw(:all);
set_locale('en');
#add_exceptions({derived=>'DERIVE', deriving=>'DERIVE'});
#print "@{stem(qw(Since derived and deriving stem from the root - derive. How can I use perl to parse the text and recognize DERIVE'))}\n";
print "@{stem(map{/(\w+)/g}<>)}\n";

sdesar

ASKER

Thanks -oza !!
It works as expected.

OZO or MONAS-
The parah & word counting program - I am implementing it in a web application.
I wanted to know how will this routine handle
multiple files.
That is if I have one text file_1.in that I want the
word and freq. count on and save it in file_1.out
And then if I want to generate a similar count on another file_2.in and store the results in file_2.out.
What's the efficient way to be able to handle a freq. count on multiple files?
Also, is map() a function in perl and is it using a LIST Data Structure for perform word counts?