Link to home
Start Free TrialLog in
Avatar of dlnewman70
dlnewman70

asked on

How to read in a file in perl, put it in a hash table, and then sort it by a hash value (not the key)

I have a file named list.txt. Inside that file are many rows that look like this.

GA_ZTC      CC_9811111
IA_ZTC      CC_9811112
IA_ZTC      CC_9711233

Each column is separated by a tab. I need to sort the list by the second column. I was trying to read it into a hash table and sort on the value, but my code is not working well. Ideally, at the end I would have a two column file sorted by the second column. I have thought about writing it out to another file, swapping columns, sorting it and then reordering it, but I think this is not efficient coding. I would appreciate it if somebody could help me figure out how to do this with a hash table. Thanks in advance.
#!/usr/bin/perl
# Read in a file and print it out.

# use strict;

open(INFILE, "Bld_Org2_S3.txt"); # open for input
open(OUTFILE,">","sortedlist.txt");

sub hashValueAscending
{
   $val{$a} cmp $val{$b};
}

my %hash;
while (<INFILE>)
{ 
   chomp; 
   my ($key, $val) = split /\t/;
   $hash{$key} .= exists $hash{$key} ? "$val" : $val;
   foreach $key (sort hashValueAscending (keys(%hash))) 
		{
			print OUTFILE $hash{$key}."\t".$key."\n";
		}
}

#flock(INFILE, LOCK_UN);
close(INFILE);
close(OUTFILE);

Open in new window

SOLUTION
Avatar of wilcoxon
wilcoxon
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hmm.  Two important questions I forgot to ask:
1) Are the values in column 1 unique?
2) Are the values in column 2 unique?

If the answer to #2 is no, then my above code will lose some data.
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of dlnewman70
dlnewman70

ASKER

Values in column 1 are not unique. I ran the code above and it removed a lot of lines. I assume this is the reason. I also had to rem out the USE STRICT; line. Other than that I think we are on the right path. If I can figure out the unique issue.

It is possible that column 1 and column 2 are not unique, but together the combination of them will yield unique results. Hopefully that makes sense.

For example,

A  1
A  2
B  1

For further clarity in my example,

If original file looks like
A  1
A  2
B  1

I am tyring to get sort to perform the following
A  1
B  1
A  2

jmatix, your one line code seems to work. Could you possibly explain the code briefly? I appreciate the fast response.
Basically it read all lines into an array @l. Then sorts the lines on the second field as key and prints the sorted lines.

{(split(/\t+/, $a))[1] cmp (split(/\t+/, $b))[1]}

The above code splits the line at tab character and compares the second fields (subscript [1]) of each line. If you want to sort descending just interchange $a and $b as:

{(split(/\t+/, $b))[1] cmp (split(/\t+/, $a))[1]}

I really appreciate both experts who helped me with this problem. I split the points based upon the valuable input and the fact that both experts really helped get me pointed in the right direction. I gave jmatix the greater points because his solution seemed to work the best. Part of it was associated with the uniqueness of the fields.