?
Solved

A little sophisticated Word Counting, based on map{}.

Posted on 2003-03-28
9
Medium Priority
?
129 Views
Last Modified: 2010-08-05
Hello!  Everyone!

To master activities of Geopgraphical Areas, I want to analyse a following file or analogous file by statistics.   This Data file is called "agridata.txt."   Analytical program is called "scr37.pl."  And the output file is called "agri-cnt.txt".   I will post all 3 of them in this order.   The resultant file is partially satisfactory in my purpose.   I will post further "WHAT-I-WANT" file which integrates counting and Geographic Data, outputing them in comprehensive way.   I have to hold Geographical data as a vriable, and counting must go on with remembrance of which Geographical area.   And output all together.   How do you suggest to modify "scr37.pl"?   Should I put map{$WordCount{$_}++} in a sub-routine?   Thank you for your help!


Mitsuru Kido

=====  "agridata.txt" ===========================
GEO: Georgia
Poultry
Cotton
Peanuts

GEO: Florida
Fruits
Rice

GEO: North-Carolina
Hogs
Poultry
Cotton
Tabacco

GEO: Indiana
Corn
Soybeans
Hogs

GEO: Oklahoma
Wheat
Beef
Cotton

GEO: Mississippi
Cotton
Corn
Poultry

=====  scr37.pl  =======================================
#! /usr/bin/perl -w
# scr37.pl
# Generalized Counter by using a Hash
# Read "agridata.txt" file.
# Write a result in "agri-cnt.txt" file.

print ("Hello, world!\n");

open(INFILE, "agridata.txt");
open(OUT, ">agri-cnt.txt");
while(<INFILE>){
   map{$WordCount{$_}++} split;
}

foreach (sort keys %WordCount){
   print OUT "$_ seen $WordCount{$_} times \n";
}


=====  "agri-cnt.txt" ==============================
Beef seen 1 times
Corn seen 2 times
Cotton seen 4 times
Florida seen 1 times
Fruits seen 1 times
GEO: seen 6 times
Georgia seen 1 times
Hogs seen 2 times
Indiana seen 1 times
Mississippi seen 1 times
North-Carolina seen 1 times
Oklahoma seen 1 times
Peanuts seen 1 times
Poultry seen 3 times
Rice seen 1 times
Soybeans seen 1 times
Tabacco seen 1 times
Wheat seen 1 times


=====  "WAHT-I-WANT" =====================================
Beef seen 1 times ----- Oklahoma
Corn seen 2 times ----- Indiana, Mississippi
Cotton seen 4 times ----- Georgia, North-Carolina, Oklahoma, Mississippi
Fruits seen 1 times ----- Florida
Hogs seen 2 times ----- North-Carolina, Indiana
Peanuts seen 1 times ----- Georgia
Poultry seen 3 times ----- Georgia, North-Carolina, Mississippi
Rice seen 1 times ----- Florida
Soybeans seen 1 times ----- Indiana
Tabacco seen 1 times ----- North-Carolina
Wheat seen 1 times ----- Oklahoma

END of MY POSTING
0
Comment
Question by:mkido
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
9 Comments
 
LVL 5

Accepted Solution

by:
PC_User321 earned 200 total points
ID: 8230003
open(INFILE, "agridata.txt");
open(OUT, ">agri-cnt.txt");
while(<INFILE>){
  if (/GEO: (.*)/) {
    $Origin = $1;
  } elsif (/^(.+)$/) {     # Not a blank line
    push(@{$ProductOrigin{$1}},$Origin);
  }
}

foreach (sort keys %ProductOrigin){
   printf("$_  seen %d times ----- %s\n", $#{$ProductOrigin{$_}} +1, join(', ', @{$ProductOrigin{$_}}));
}
0
 

Author Comment

by:mkido
ID: 8239391
Hello! PC_User321
I will test your code in few days.  I am a novice in Perl, so it takes time.   "scr37.pl" was written also by you, I believe.   Thank you a lot.  Please wait my testing.  mkido.
0
 
LVL 5

Expert Comment

by:PC_User321
ID: 8245358
>> I will test your code in few days.
Good luck.

>> "scr37.pl" was written also by you, I believe
Two years ago I answered http://www.experts-exchange.com/Programming/Programming_Languages/Perl/Q_20083048.html for you.  You must know lots of Perl by now!!
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:mkido
ID: 8255676
Dear PC_User321

I wrote down your code in my Linux, ran with Debugger, here is my first report.   On the screen, it appears as follow.   It DID NOT write to "agri-cnt.txt" file.   So I redirected the screen output as a file "agri-cnt.txt" which is view by "more" of Linux shell.   It looked like the below.   Then, to post the result to you, I copied it from Linux to PC, and opened it the same "agri-cnt.txt."   Then to my surprise, it looks perfect as further below.   I still need to study your code, but this is the first
response to you.  

=== On Linux screen, and by "more" ===

,Mississippiina---- Georgia
 seen 1 times ----- Oklahoma
,Mississippis ----- Indiana
,Mississippiina---- Georgia
 seen 1 times ----- Florida
,Indianatimes ----- North-Carolina
 seen 1 times ----- Georgia
,Mississippiina---- Georgia
 seen 1 times ----- Florida
 seen 1 times ----- Induana
 seen 1 times ----- North-Carolina
 seen 1 times ----- Oklahoma


=== Pico and Ex of Linux as well as Note-Pad and Word of PC ===

Hello, world!
 seen 6 times ----- Georgia,Florida,North-Carolina,Indiana,Oklahoma,Mississippi
Beef seen 1 times ----- Oklahoma
Corn seen 2 times ----- Indiana,Mississippi
Cotton seen 4 times ----- Georgia,North-Carolina,Oklahoma,Mississippi
Fruits seen 1 times ----- Florida
Hogs seen 2 times ----- North-Carolina,Indiana
Peanuts seen 1 times ----- Georgia
Poultry seen 3 times ----- Georgia,North-Carolina,Mississippi
Rice seen 1 times ----- Florida
Soybeans seen 1 times ----- Indiana
Tabacco seen 1 times ----- North-Carolina
Wheat seen 1 times ----- Oklahoma

mkido
0
 
LVL 5

Expert Comment

by:PC_User321
ID: 8260693
Hi mkido,
The script did not write to the file because I forgot to include the file handle (OUT).  I should have said:
    printf(OUT "$_  seen %d times ----- %s\n", $#{$ProductOrigin{$_}} +1, join(', ', @{$ProductOrigin{$_}}));

I can't explain why viewing the file with "more" gave a strange result.

Another mystery is the first line -
   "seen 6 times ----- Georgia,Florida,North-Carolina,Indiana,Oklahoma,Mississippi"
In agridata.txt are there perhaps blank lines or lines with only white space?

Another mystery is the fact that there is just a comma between the state names instead of a comma and a space.
0
 

Author Comment

by:mkido
ID: 8276853
Dear PC_User321

I finished my work which I wanted to do.  It was a big file and not only agriculture data.  Wonderful!  My guess about funny printing on a screen by "more" is due to my Linux.  Just for you interest, there are ^M (caret and upper M character) at the end of all variable words in "pico", "ex" and "emcas" editor when I viewed "agri-cnt.txt".   For example, Beef^M, Corn^M, Geogia^M, Indiana^M, ...so on).   But it didn't harm when I transferred "agri-cnt.txt" to PC and printed.  I haven't understood completely of your code, so before closing let me ask just one more question.

   push(@{$ProductOrigin{$1}},$Origin);
 
I learned that a hash % is a fancy kind of array, so I can select an individual hash elements by enclosing the key in braces, such as $longday{"Wed"}("PP" pg.8).  But your code

   @{$ProductOrigin{$1}}

is array @ (so we can push), but I don't know how enclosing {$1} works.  Thank you, if you have time to explain to a novice.   mkido
0
 
LVL 5

Expert Comment

by:PC_User321
ID: 8277050
OK, you understand that a hash is a fancy type of array, and that there is a 'Value' associated with every key.  Now the trick that I am using is that my Values are not scalars (like '5' ot 'Corn'), but they are arrays (or striclty speaking, references to arrays).

So, for example, the value of hash element
    @ProductOrigin{Poultry}
is the array
    (Georgia, North-Carolina, Mississippi)

You can learn more about hashes of arrays by running the command perldoc perllol.
You may find the result difficult to understand.  You could also try perldoc perldsc.
With ActiveState Perl these documents are also available from
Start->Programs->ActiveState Active Perl->Documentation then select perllol or perldsc.
0
 

Author Comment

by:mkido
ID: 8317319
Hi!  PC_User321.
I am getting it, that your code has a nested Hash in Array.   Am I right?   Thank you, and I close this.   mkido
0
 
LVL 5

Expert Comment

by:PC_User321
ID: 8317554
Thanks for the points.
In my code I have many arrays, all stored in one hash.
0

Featured Post

On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

765 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question