dlnewman70
asked on
How to read in a file in perl, put it in a hash table, and then sort it by a hash value (not the key)
I have a file named list.txt. Inside that file are many rows that look like this.
GA_ZTC CC_9811111
IA_ZTC CC_9811112
IA_ZTC CC_9711233
Each column is separated by a tab. I need to sort the list by the second column. I was trying to read it into a hash table and sort on the value, but my code is not working well. Ideally, at the end I would have a two column file sorted by the second column. I have thought about writing it out to another file, swapping columns, sorting it and then reordering it, but I think this is not efficient coding. I would appreciate it if somebody could help me figure out how to do this with a hash table. Thanks in advance.
GA_ZTC CC_9811111
IA_ZTC CC_9811112
IA_ZTC CC_9711233
Each column is separated by a tab. I need to sort the list by the second column. I was trying to read it into a hash table and sort on the value, but my code is not working well. Ideally, at the end I would have a two column file sorted by the second column. I have thought about writing it out to another file, swapping columns, sorting it and then reordering it, but I think this is not efficient coding. I would appreciate it if somebody could help me figure out how to do this with a hash table. Thanks in advance.
#!/usr/bin/perl
# Read in a file and print it out.
# use strict;
open(INFILE, "Bld_Org2_S3.txt"); # open for input
open(OUTFILE,">","sortedlist.txt");
sub hashValueAscending
{
$val{$a} cmp $val{$b};
}
my %hash;
while (<INFILE>)
{
chomp;
my ($key, $val) = split /\t/;
$hash{$key} .= exists $hash{$key} ? "$val" : $val;
foreach $key (sort hashValueAscending (keys(%hash)))
{
print OUTFILE $hash{$key}."\t".$key."\n";
}
}
#flock(INFILE, LOCK_UN);
close(INFILE);
close(OUTFILE);
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Values in column 1 are not unique. I ran the code above and it removed a lot of lines. I assume this is the reason. I also had to rem out the USE STRICT; line. Other than that I think we are on the right path. If I can figure out the unique issue.
It is possible that column 1 and column 2 are not unique, but together the combination of them will yield unique results. Hopefully that makes sense.
For example,
A 1
A 2
B 1
It is possible that column 1 and column 2 are not unique, but together the combination of them will yield unique results. Hopefully that makes sense.
For example,
A 1
A 2
B 1
ASKER
For further clarity in my example,
If original file looks like
A 1
A 2
B 1
I am tyring to get sort to perform the following
A 1
B 1
A 2
If original file looks like
A 1
A 2
B 1
I am tyring to get sort to perform the following
A 1
B 1
A 2
ASKER
jmatix, your one line code seems to work. Could you possibly explain the code briefly? I appreciate the fast response.
Basically it read all lines into an array @l. Then sorts the lines on the second field as key and prints the sorted lines.
{(split(/\t+/, $a))[1] cmp (split(/\t+/, $b))[1]}
The above code splits the line at tab character and compares the second fields (subscript [1]) of each line. If you want to sort descending just interchange $a and $b as:
{(split(/\t+/, $b))[1] cmp (split(/\t+/, $a))[1]}
{(split(/\t+/, $a))[1] cmp (split(/\t+/, $b))[1]}
The above code splits the line at tab character and compares the second fields (subscript [1]) of each line. If you want to sort descending just interchange $a and $b as:
{(split(/\t+/, $b))[1] cmp (split(/\t+/, $a))[1]}
ASKER
I really appreciate both experts who helped me with this problem. I split the points based upon the valuable input and the fact that both experts really helped get me pointed in the right direction. I gave jmatix the greater points because his solution seemed to work the best. Part of it was associated with the uniqueness of the fields.
1) Are the values in column 1 unique?
2) Are the values in column 2 unique?
If the answer to #2 is no, then my above code will lose some data.