Link to home
Start Free TrialLog in
Avatar of sdesar
sdesar

asked on

counting words

I know how to count words and its frequency in a text file.
However, I don not know how to count the words and its frequency in different paragraphs.

Any suggestions greatly appreciated.
Thanks
ASKER CERTIFIED SOLUTION
Avatar of monas
monas
Flag of Lithuania image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of sdesar
sdesar

ASKER

I tried this but I am receiving an error message as follow-
Unrecognized file test: -n at line 3

I made some assumptions-
I typed
#!/usr/bin/perl

per -00 -n -e '@.........."\n"list_of_files

I am assuming that the
list_of_files is a file that cointains the text.

I thougt that there was a semicolon missing after list_of_files;

However, that didn't solve it either.

Any other suggestions?
NOOOOOOO!

If you like to write contens in file, then write to cnt.pl

#!/usr/bin/perl -00 -n
@w = /\w+/g;print $#w."\n";


and from command line call

cnt.pl file_1 file_2 file_3

where file_X is name of file with text where you want to count words in paragraph.

      Good look
Avatar of sdesar

ASKER

This seems to count the words only and not the words and its frequency in the individual parahs.

Anything else that I should do?

PS. Thanks for your time on these suggestions.
Well, you have all the words in @w array. If you want frequency then add

map { $wc{$_}++; } @w;
foreach $wd(keys %wc){print $wd.':'.$wc{$wd}."\n";}
print "---------------\n";

This will additionaly print you every word and number of occurences of this word in the paragraph
Avatar of ozo
What's a word?
Avatar of sdesar

ASKER

I test this and it works.
How do I list the paragraph numbers-
ie Parah1
   word      freq
   Parah2
   word      freq
#!/usr/bin/perl -00 -n
print "Parah ".$..":";
@w = /\w+/g;print $#w."\n";
map { $wc{$_}++; } @w;
foreach $wd(keys %wc){print  "\t".$wd."\t".$wc{$wd}."\n";}
Avatar of sdesar

ASKER

Thanks monas!!
I gave U excellent points.  
Have they been recorded?
Yes, TNX
Avatar of sdesar

ASKER

How can I use perl for
word recognition?

Example - If there are a bunch of words in a text file like -

this text is derived from the book and to see from information on deriving check out the textbook.

Since derived and deriving stem from the root - derive.  How can I use perl to parse the text and recognize DERIVE.
use Lingua::Stem qw(:all);
set_locale('en');
#add_exceptions({derived=>'DERIVE', deriving=>'DERIVE'});
#print "@{stem(qw(Since derived and deriving stem from the root - derive.  How can I use perl to parse the text and recognize DERIVE'))}\n";
print "@{stem(map{/(\w+)/g}<>)}\n";
Avatar of sdesar

ASKER

Thanks -oza !!  
It works as expected.  

OZO or MONAS-
The parah & word counting program - I am implementing it in a web application.
I wanted to know how will this routine handle
multiple files.
That is if I have one text file_1.in that I want the
word and freq. count on and save it in file_1.out
And then if I want to generate a similar count on another file_2.in and store the results in file_2.out.
What's the efficient way to be able to handle a freq. count on multiple files?
Also, is map() a function in perl and is it using a LIST Data Structure for perform word counts?