asked on

How to read in a file in perl, put it in a hash table, and then sort it by a hash value (not the key)

I have a file named list.txt. Inside that file are many rows that look like this.

GA_ZTC      CC_9811111
IA_ZTC      CC_9811112
IA_ZTC      CC_9711233

Each column is separated by a tab. I need to sort the list by the second column. I was trying to read it into a hash table and sort on the value, but my code is not working well. Ideally, at the end I would have a two column file sorted by the second column. I have thought about writing it out to another file, swapping columns, sorting it and then reordering it, but I think this is not efficient coding. I would appreciate it if somebody could help me figure out how to do this with a hash table. Thanks in advance.

#!/usr/bin/perl
# Read in a file and print it out.

# use strict;

open(INFILE, "Bld_Org2_S3.txt"); # open for input
open(OUTFILE,">","sortedlist.txt");

sub hashValueAscending
{
   $val{$a} cmp $val{$b};
}

my %hash;
while (<INFILE>)
{ 
   chomp; 
   my ($key, $val) = split /\t/;
   $hash{$key} .= exists $hash{$key} ? "$val" : $val;
   foreach $key (sort hashValueAscending (keys(%hash))) 
		{
			print OUTFILE $hash{$key}."\t".$key."\n";
		}
}

#flock(INFILE, LOCK_UN);
close(INFILE);
close(OUTFILE);

Open in new window

SOLUTION

wilcoxon

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

wilcoxon

Hmm. Two important questions I forgot to ask:
1) Are the values in column 1 unique?
2) Are the values in column 2 unique?

If the answer to #2 is no, then my above code will lose some data.

ASKER CERTIFIED SOLUTION

Justin Mathews

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

dlnewman70

ASKER

Values in column 1 are not unique. I ran the code above and it removed a lot of lines. I assume this is the reason. I also had to rem out the USE STRICT; line. Other than that I think we are on the right path. If I can figure out the unique issue.

It is possible that column 1 and column 2 are not unique, but together the combination of them will yield unique results. Hopefully that makes sense.

For example,

A 1
A 2
B 1

dlnewman70

ASKER

For further clarity in my example,

If original file looks like
A 1
A 2
B 1

I am tyring to get sort to perform the following
A 1
B 1
A 2

dlnewman70

ASKER

jmatix, your one line code seems to work. Could you possibly explain the code briefly? I appreciate the fast response.

Justin Mathews

Basically it read all lines into an array @l. Then sorts the lines on the second field as key and prints the sorted lines.

{(split(/\t+/, $a))[1] cmp (split(/\t+/, $b))[1]}

The above code splits the line at tab character and compares the second fields (subscript [1]) of each line. If you want to sort descending just interchange $a and $b as:

{(split(/\t+/, $b))[1] cmp (split(/\t+/, $a))[1]}

dlnewman70

ASKER

I really appreciate both experts who helped me with this problem. I split the points based upon the valuable input and the fact that both experts really helped get me pointed in the right direction. I gave jmatix the greater points because his solution seemed to work the best. Part of it was associated with the uniqueness of the fields.