asked on

How to create hash of array

I have a file that has common date entries. I need to create a hash of arrays based on the date and don't know how to do this.

This is not a homework assignment. I appreciate your help. Thank you in advance.

BioI

you can create a hash of arrays with the following syntax:
$hash{$key} = [@values];

and you can access this hash using this syntax: e.g. element i of the array that corresponds to key $key:
$hash{$key}[i]

BioI

and you also can do the same things as you can do with usual arrays. E.g. the equivalent of:
$#array for @array becomes $#{$hash{$key}} for $hash{$key}
scalar(@array) becomes scalar(@{$hash{$key}})

uluttrell

ASKER

When I run the code, I get the following: ARRAY(0X1A8F144).
My code is attached.
===Begin code.pl
#!/usr/bin/perl
use strict;
use warnings;
my %hashofarray;
my @array;
$hashofarray{3} = [@array_two];
print [@hashofarray];
===End code.pl

BioI

I don't know what you want to print exactly, but if you want to print the array that is saved with key "3", you should use:
print @{$hashofarray{3}};

if you want to print the complete hash with all the corresponding arrays, you need:
foreach $key (keys %hashofarray) {
print "key -> $key: array: @{$hashofarray{$key}} \n";
}

uluttrell

ASKER

The format of the file that I'm using is
servername, date, valueA, valueB, etc
I want to group all entries that have the date in in common.

BioI

#!/usr/bin/perl -w

use strict;

my $file = "/path/to/your/file.ext";
open FILE, "$file" || die "Unable to open file: $!\n";
my %hashofarray;
while (<FILE>) {
# assume a tab-delimited file: date \t valueA \t valueB \t ...
chomp;
my $line = $_;
my ($date, @values) = split (/\t/, $line);
$hashofarray{$date} = [@values];
print "key -> $date: values: @{$hashofarray{$date}} \n";
}

BioI

sorry, I didn't see the servername:
change these two lines in the script:

my ($servername, $data, @values) = split (/\t/, $line);
$hashofarray{$date} = [$servername, @values];

FishMonger

Since the the file seems to be comma seperated (with a possible space), shouldn't the split loo like this:

split /,\s*/, $line

uluttrell

ASKER

Thanks for the correction FishMonger. How would the script be altered if I need the hash of array to be printed in a format similar to the file's format?

FishMonger

This isn't the best method, but it prints it out in the same format.

while (<FILE>) {
chomp;
my $line = $_;
my ($servername, $date, @values) = split (/,\s*/, $line);
$hashofarray{$date} = [$servername, @values];
print "$date, ";
print join ", ", @{$hashofarray{$date}};
print "\n";
}

uluttrell

ASKER

what is the best method FishMonger?

FishMonger

I'm late for work, so I can't do any testing at the moment but I'll need to know a little more about what you are trying to accomplish and a more complete example of your code in order to suggest other possible [better] alternitives.

uluttrell

ASKER

Did not mean to keep you . Here is what I'm trying to do. I have two patterns in a file. Depending on the matched pattern, all lines that match that pattern are placed to one of two arrays.

Then I need to index the two arrays by date because the latest date has the desired value.

Here is my attempt to code it.
==Begin code.pl
#!/usr/bin/perl -w
my @hashofarray;
my @newarray;
my @one_array;
my @two_array;
my $line;
use strict;

my $file = "input3.txt";
open FILE, "$file" || die "Unable to open file: $!\n";
my %hashofarray;
while (<FILE>) {
chomp;
my $line = $_;
my ($servername, $date, @values) = split (/,\s*/, $line);
$hashofarray{$date} = [$servername, @values];
#print "$date, ";
#print join ", ", @{$hashofarray{$date}};
@{$hashofarray{$date}} = @newarray;
#print "\n";
}
while (<@newarray>){
if ($line =~ m%serveri..%) {
push @one_array, $line;
} else {
push @two_array, $line;
}
}
print @one_array;
==End code.pl

BioI

Sorry, but I don't understand some part of your script. When you do:
$hashofarray{$date} = [$servername, @values];
you assign an array containing $servername and @values to the hash with as key $date. But it seems to me that you are trying to do the same thing a few lines further:
@{$hashofarray{$date}} = @newarray;
I don't know what you want to do here [i think @newarray is empty, no?], but the syntax isn't correct. If you want to assign the array @newarray as a value to the hash with as key $date, you should do:
$hashofarray{$date} = [@newarray];

When you want to print the different fields, you can also use the printf - function if you know how long the fiels are. But this is just a "layout" thing. When you just want to print the content of the hash without bothering about layout, the method of FishMonger works perfect.

When I get you right, you first want to store all data from a file into an hash of arrays [%hashofarray]. In the second step (starting from the if loop) you want to check whether the servername matches to your pattern "serveri...". When I am correct in guessing your aim, you should do something like this:

foreach $date (keys %hashofarray) {
if ($hashofarray{$date}[0] =~ m/serveri.../) {
push(@one_array, $date);
}else{
push(@two_array, $date);
}
}

But now I only added the date to one of the two arrays: if you want to add more information, you can use a join-function as mentioned by FishMonger. E.g. when you want to store all the information in the array, you can use:
foreach $date (keys %hashofarray) {
if ($hashofarray{$date}[0] =~ m/serveri.../) {
my @array = [$date, @{$hashofarray{$date}}];
my $line = join (", ", @array);
push(@one_array, $line);

}else{
my @array = [$date, @{$hashofarray{$date}}];
my $line = join (", ", @array);
push(@two_array, $line);
}
}

FishMonger

Both, your discription and script are confussing and like BioI, I still don't fully understand what you're trying to accomplish.

1) Are the @one_array and @two_arrays being used elsewhere in your script?

2) To me, it seems that those arrays are duplications of what's in the hash...right? If so, WHY?

3) If you have multiple lines with the same date, do you want to store each of them in the hash or just the latest date that has the desired values?

4) Assuming that BioI's understanding is correct, rather than splitting and joining the line with the same delimiter, we can use a regex to extract the info and if need be, rearrange the order of the fields.

5) Assuming that I understand what you need to do, (which I'm not sure that I do), you can drop the arrays and use a couple of scalars and a regular hash. If you want more flexibility in seperating the data, you can use a hash of hashes instead of the hash of arrays; the date and server name would be the keys and the rest of the line would be the "final" value.

I'm still at work, so I can't do any testing, but when I get home, I'll run a couple of tests based on my [possibly false] assumptions.

FishMonger

uluttrell,

I just got home from work and need to be back in 7 hours, so I'll wait until tomarrow to do the testing. If you can help clear up some of the confussion on what you're needing, I'll be able to provide you with a proper answer.

uluttrell

ASKER

FishMonger,
Answer to question 1) I want to use the @one_array and @two_arrays to sum the values of select fields elsewhere in the script. I have not included that because I think I have an idea of how to code that.

Answer to question 2) Those arrays are duplications of what's in the hash. I did it that way so I could see that the hash picked up the correct values. I'm learning this and need the visual picture as proof that I'm coding it correctly.

Answer to question 3) I want to store each of the lines with the same date so that I can sum select fields.

Answer to question 4) BioI's understanding is correct.

Answer to question 5) That's fine.

Thank you FishMonger and BioI for your help. You're helping me to learn Perl :)

BioI

maybe it is a good idea to display a part of your input file here? Maybe it is easier for us to hulp you.
As you mentioned previously, your input file-format looks like:
line 1: servername1, date1, valueA1, valueB1, valueC1
line 2: servername2, date2, valueA2, valueB2, valueC2
line 3: servername3, date1, valueA3, valueB3, valueC3

e.g. for line 1 and 3 have the same date. Which information of these line do you want to have stored?
Or is my input file not correct and looks it like this:
line 1: servername1, date1, valueA
line 1: servername2, date2, valueB
line 1: servername3, date1, valueC

uluttrell

ASKER

BioI, Your first data representation is correct. I want to sum all of ValueA's for the same date.

BioI

Do you also want to keep the servername stored? Because writing the script would be much easier when you only have to worry about the date and (let's call it) "valueA".
But I guess you want the server name, and then it is much easier to create a hash of hashes [like FishMonger already noticed]. Your "parent" hash contains as keys all the different dates, and each "child" hash contains value A. Only disadvantage here is that every server can only have one value. If one server can have different values, you have to create a hash of hashes of arrays [very complex and don't know whether this works in perl, FishMonger, plz help me out here ;-)]
This is the example when every server can have one value:

my $file = "input3.txt";
open FILE, "$file" || die "Unable to open file: $!\n";
my %hashofarray;
while (<FILE>) {
chomp;
my $line = $_;
my ($servername, $date, $valueA, @values) = split (/,\s*/, $line);
$hashofarray{$date}{$servername} = $valueA;
}

Remark: all other values than valueA are ignored [stored in @values but not used.
Now you can loop trough the values for a certain server, using this one:
[but I don't know what the aim is, so next part will be probably wrong...]

foreach my $date (keys %hashofarrays) {
foreach my $server (keys %{hashofarrays{$date}}) {
if ($server =~ m/serveri.../) {
#do your thing
}else{
# do the other thing
}
}
}

uluttrell

ASKER

BioI, I would like to keep the server names.

BioI

Yes, but the servername is still stored, namely as key of the "child" hash.

%hashofarrays -> keys: dates
-> values: "child" hashes
%{$hashofarrays{$date}} -> keys: servernames
-> values: valuesA

Can one server have different values for one date?

FishMonger

Things have been crazy for me here at work so I haven't had time to work on this but as soon as I get home, I'll look into it.

BioI

yes FishMonger, hurry up, we need your expert knowledge here ;-)
seems with have a different time zone, because here it is 01:00 at night and time to catch some sleep.
uluttrell, quick two questions [I think important questions to solve your problem]
1) for one specific date, can a server have different values? Or is there for every server on one day only one value?
2) what do you want to count: the sum of the values of a specific server for one day?

If you can answer this questions, I can continue the work on this question tomorrow (or FishMonger this evening)
CU

FishMonger

How about if we simplify the data structure and just use a regular hash using the date as the keys and the values will be a concatination of each of the rows for that date?

use strict;

my (%hash, $total);
my $file = "input3.txt";

open FILE, $file or die "Unable to open file: $!\n";

while (<FILE>) {
my ($server, $date, $values) = split (/,\s?/, $_, 3);
$hash{$date} .= "$server, $values";
}

for my $date (sort keys %hash) {
next unless $hash{$date} =~ /^serveri../i; # skip over unwanted servers
my @rows = split /\n/, $hash{$date};
foreach my $row (@rows) {
my @col = split /,\s*/, $row;
$total += $col[1]; # add up the values in the "valueA" column
}
}
print $total;

FishMonger

Oops, I hit the submit button before I was finished explaining. I'm going to take a break and if you really want to use the more complex data structure, I'll work on it after dinner.

BioI

okay, I am following. using a reguar expression is indeed also an option.
Small remark: don't we have to separate the different "server, value" combination by a new line because now, everything is pasted to each other, while you split using the new line (\n) character in the second part of the script (where summing of the different values A)...

so change:
hash{$date} .= "$server, $values";
into:
hash{$date} .= "$server, $values\n";

Another remark: when we are using this:
next unless $hash{$date} =~ /^serveri../i; # skip over unwanted servers
we are skipping every date where there is 1 unwanted server, but we also throw away the other servers. Shouldn't we move this line more to the back and change it into:
next unless $col[0] =~ /^serveri../i; # skip over unwanted servers.
So something like this:

for my $date (sort keys %hash) {
next unless $hash{$date} =~ /^serveri../i; # skip over unwanted servers
my @rows = split /\n/, $hash{$date};
foreach my $row (@rows) {
my @col = split /,\s*/, $row;
next unless $col[0] =~ /^serveri../i; # skip over unwanted servers
$total += $col[1]; # add up the values in the "valueA" column
}
}

uluttrell

ASKER

BioI, Sorry for not answering your questions sooner. I am fighting the flu :(
1) for one specific date, can a server have different values? Or is there for every server on one day only one value?
Answer: For one specific date, a server can have different values.
2) what do you want to count: the sum of the values of a specific server for one day?
Answer: Yes, I want to sum the values of a specific server for one day. If the server has 4 dates reported, I want to sum the values for each of the four dates.

ASKER CERTIFIED SOLUTION

BioI

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

uluttrell

ASKER

BioI, This works as I desired. Thanks so much. I appreciate your help and efforts. I have increased the points because this was so difficult and am splitting between you and FishMonger.

BioI

Okay, that's a fair deal I guess.
thanx!
CU around!

FishMonger

I'm just waking up and see that we've come to a consensus on how to the task, but before I rush off to work, I though I add another comment.

There are 2 lines that I'm not sure if you understand what's happening.

my ($server, $date, $values) = split (/,\s?/, $_, 3);

Since we're not using chomp when reading in the data, $values still has it, so it shouldn't be necessary to add a second one in the has assignment.

next unless $hash{$date} =~ /^serveri../i; # skip over unwanted servers

Since the lines that follow that are only working on the list of servers that we want, there is no need to include:

next unless $col[0] =~ /^serveri../i; # skip over unwanted servers

BioI

Hi FishMonger,

I have two remarks on your question, but correct me if I am wrong:
1) I understand what you mean when you say we don't have to add "\n" in:
my ($server, $date, $values) = split (/,\s?/, $_, 3);
but this only works when every line only contains one value. What if there are more than one value? because i thought the input file contained more than one value (also valueB, valueC, valueD,...)

2) You are right that in the code I have written, the line -- ext unless $col[0] =~ /^serveri../i; # skip over unwanted servers -- has no function, but in fact I intended to remove the line -- next unless $hash{$date} =~ /^serveri../i; # skip over unwanted servers -- and to look for unwanted servers further on. So the script would be like:
for my $date (sort keys %hash) {
my @rows = split /\n/, $hash{$date};
foreach my $row (@rows) {
my @col = split /,\s*/, $row;
next unless $col[0] =~ /^serveri../i; # skip over unwanted servers
$total += $col[1]; # add up the values in the "valueA" column
}
}

Why? Because I thought that one date could contain several servers with several values e.g.
server1, date1, value1, valuex, valuey, ...
server2, date1, value2, valuez, valueq,...
....
In our output, this will give:
$hashofarray{'date1'} = "server1, value1\n server2 value2";
but when we assume that server1 is the unwanted server and we add the line
next unless $hash{$date} =~ /^serveri../i; # skip over unwanted servers
then value2 of server2 would be ignored, and I think that is not the purpose...
So when you add the line -- next unless $col[0] =~ /^serveri../i; # skip over unwanted servers -- later on, you avoid this problem.
Am I correct?

BioI

uluttrell,
Very nice that you want to award me 400 points for this question, but I have the feeling that FishMonger and I contributed equally to this question. Maybe 50-50 is more fair?

uluttrell

ASKER

BioI, Thank you for saying that. I will split the points 50-50 but need advise as to how to accomplish that. Do you know?

jmcg

Drop a note in Community Support telling a moderator what you want done. Be sure to include a link to this thread in your message.

uluttrell

ASKER

Thanks jmcg. I will do that.

uluttrell

ASKER

I have posted the request to Community Support area and am awaiting a decision. Thanks.

BioI

uluttrell,
I have read your message on the community support. very nice that you want to give both FishMonger and me 400 points, but then the total points of this question would be 800 points and I think the maximum is 500 :-) What I actually meant with 50-50, was that I would also be happy when you would split the 500 points between both of us, so 250 points each [I guess the administrator will suggest this]. But it's nice that you appreciate our advice ;-)
Thanx
BioI

uluttrell

ASKER

Thanks modulo. I posted the new question in the perl area.

FishMonger, please grab your points. Thanks for your help :-) I greatly appreciate it.

uluttrell

ASKER

FishMonger,How would I modify the code to account for servers that don't start with serveri. I need to sum all of the servers eventually, but I need to group them by server name?

FishMonger

uluttrell,
That was one of the features that I was planning on putting in that script but at the time I was "too lazy". I won't be able to work on it until tomarrow. (I'm in California so it's 5pm here). I was thinking about changing the hash so that the keys would be the server names which would make this a little easier. But doing so will probably depend on your exact requirements.

FishMonger

This needs some more work and I'm not sure if it's exactly what you're wanting, but try it and let me know.

use strict;

my (%servers, $total);
my $file = "input3.txt";

open FILE, $file or die "Unable to open file: $!\n";

while (<FILE>) {
my ($server, $values) = split (/,\s?/, $_, 2);
$servers{$server} .= $values;
}

server_totals(%servers);

sub server_totals {
my %server = @_;
my ($srv, $date, $values, %date_info, @row, $row, $total);

print "Using a regular expression pattern\n";
print "Enter the server name(s) that you want to total: ";
chomp (my $pattern = <STDIN>);

for $srv (sort keys %server) {
next unless $srv =~ /$pattern/i; # skip over unwanted servers
@row = split /\n/, $server{$srv};
foreach (@row) {
($date, $values) = split (/,\s?/, $_);
$total += $values;
}
$date_info{$date} = $total;
print "$srv $date Totals: $date_info{$date}\n";
}
}

FishMonger

This one is closer to what I think you want, but it still needs work because it is only working with the valueA and drops the rest of the values. As you see, I've changed the main hash to a hash_of_hases, and used another regular hash in the spliting and addition of the valueA, but in order to include the rest of the values, we'll probably need to make it a hash_of_arrays. I've got some other things that I need to work on, so I'll check back-in later today to see if this is close to what you're looking for.

use strict;
my %servers;

while (<DATA>) {
my ($server, $date, $values) = split (/,\s?/, $_, 3);
$servers{$server}{$date} .= $values; # create a hash_of_hashes
}

server_totals(%servers);

sub server_totals {
my %server = @_;
my ($srv, $date, %date_info, @row);

print "Using a regular expression pattern\n";
print "Enter the server name that you want to total: ";
chomp (my $pattern = <STDIN>);

foreach $srv (sort keys %server) {
next unless $srv =~ /$pattern/i; # skip over unwanted servers
print "$srv\n";
for $date (sort keys %{$server{$srv}}) {
@row = split /\n/, $server{$srv}{$date};
foreach (@row) {
my ($values) = split (/,\s?/, $_);
$date_info{$date} += $values;
}
print "$date: $date_info{$date}\n";
}
}
}

uluttrell

ASKER

FishMonger, The posting from 11/15/03 at 12:08 PM best suits my needs. That works really well.

uluttrell

ASKER

FishMonger, Another question for you. If I run the code as you have it, it sums for the first column immediately following the date. I need to sum the 26th column after the date. I modified the line that reads my ($server, $values) = split (/,\s?/, $_, 2); to read my ($server, $values) = split (/,\s?/, $_, 33); however, it produces 0 for each server in spite of each server having a value for that column. Would you explain to me what this line does and how I can correct it to account for multiple columns that need to be summed?

Thank you in advance.

BioI

when you use:
my ($server, $date, $values) = split (/,\s?/, $_, 3);
this means you want to split $_ at every white space with a limitation of 3 values that are returned.
eg. server, date, 10, 20, 30, 50, 70
will return: server, date, 10

while the same example with my ($server, $date, $values) = split (/,\s?/, $_, 5);
# remark: 5 instead of 3
will give: server, date, 10, 20, 30;

so your code
my ($server, $values) = split (/,\s?/, $_, 33);
will split $_ at white spaces but with a maximum of 33 values returned, but you only assign two of them to a variable.

I guess for what you want:
my ($server, $date, @values) = split (/,\s?/, $_); #skip the maximum number of fields
my $value = $values[25];

p.s. mayube there is a more condense solution where you can do this in one regular expression(?)

FishMonger

Actually BioI, this is how that split is working.

$_ = ' server, date, 10, 20, 30, 50, 70 ';
my ($server, $date, $values) = split (/,\s?/, $_, 3); # splits at each comma with an optional space after the it.

Limiting the split to three means that the first 2 vars hold the values that you'd expect but the third var holds everything else upto the end of the string (including \n if you didn't chomp it prior to the split).

If you print those vars, you'll see that
$server = 'server'
$date = 'date'
$values = '10, 20, 30, 50, 70'

The simplest way to modify that script [I'm assuming you're referring to the script that's using the regular hash, not the one using the hash_of_hashes] to add up the 26th column would be to change the addition assignment in the for loop.

change:
($date, $values) = split (/,\s?/, $_);
$total += $values;

to this:
($date, @values) = split (/,\s?/, $_);
$total += $values[25];

BioI

oops, you're right FishMonger :-S
So I use this already a few years without knowing what I am doing :-)

uluttrell

ASKER

Thanks FishMonger :)