Solved

How to read in a file in perl, put it in a hash table, and then sort it by a hash value (not the key)

Posted on 2010-11-23
8
308 Views
Last Modified: 2012-05-10
I have a file named list.txt. Inside that file are many rows that look like this.

GA_ZTC      CC_9811111
IA_ZTC      CC_9811112
IA_ZTC      CC_9711233

Each column is separated by a tab. I need to sort the list by the second column. I was trying to read it into a hash table and sort on the value, but my code is not working well. Ideally, at the end I would have a two column file sorted by the second column. I have thought about writing it out to another file, swapping columns, sorting it and then reordering it, but I think this is not efficient coding. I would appreciate it if somebody could help me figure out how to do this with a hash table. Thanks in advance.
#!/usr/bin/perl
# Read in a file and print it out.

# use strict;

open(INFILE, "Bld_Org2_S3.txt"); # open for input
open(OUTFILE,">","sortedlist.txt");

sub hashValueAscending
{
   $val{$a} cmp $val{$b};
}

my %hash;
while (<INFILE>)
{ 
   chomp; 
   my ($key, $val) = split /\t/;
   $hash{$key} .= exists $hash{$key} ? "$val" : $val;
   foreach $key (sort hashValueAscending (keys(%hash))) 
		{
			print OUTFILE $hash{$key}."\t".$key."\n";
		}
}

#flock(INFILE, LOCK_UN);
close(INFILE);
close(OUTFILE);

Open in new window

0
Comment
Question by:dlnewman70
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 2
  • 2
8 Comments
 
LVL 26

Assisted Solution

by:wilcoxon
wilcoxon earned 200 total points
ID: 34198856
This should work...
#!/usr/bin/perl
# Read in a file and print it out.

use strict;
use warnings;

open(INFILE, "Bld_Org2_S3.txt") or die "could not open Bld_Org2_S3.txt: $!";
open(OUTFILE,">","sortedlist.txt") or die "could not write sortedlist.txt: $!";

my %hash;
while (<INFILE>) { 
    chomp; 
    my ($key, $val) = split /\t/;
# I'm not sure what you were trying to do with this line - it effectively does nothing
#   $hash{$key} .= exists $hash{$key} ? "$val" : $val;
    # create the hash in reverse order to make it simpler
    $hash{$val} = $key;
}
close(INFILE);

foreach $val (sort keys %hash) {
    print OUTFILE "$hash{$val}\t$val\n";
}
close(OUTFILE);

Open in new window

0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 34198870
Hmm.  Two important questions I forgot to ask:
1) Are the values in column 1 unique?
2) Are the values in column 2 unique?

If the answer to #2 is no, then my above code will lose some data.
0
 
LVL 16

Accepted Solution

by:
jmatix earned 300 total points
ID: 34198996
If you don't care about using the hash this one line would do it:

perl -e '@l = <>; print sort{(split(/\t+/, $a))[1] cmp (split(/\t+/, $b))[1]} @l' data.txt >output.txt

If you are on windows:

perl -e "@l = <>; print sort{(split(/\t+/, $a))[1] cmp (split(/\t+/, $b))[1]} @l" data.txt >output.txt
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:dlnewman70
ID: 34199555
Values in column 1 are not unique. I ran the code above and it removed a lot of lines. I assume this is the reason. I also had to rem out the USE STRICT; line. Other than that I think we are on the right path. If I can figure out the unique issue.

It is possible that column 1 and column 2 are not unique, but together the combination of them will yield unique results. Hopefully that makes sense.

For example,

A  1
A  2
B  1

0
 

Author Comment

by:dlnewman70
ID: 34199603
For further clarity in my example,

If original file looks like
A  1
A  2
B  1

I am tyring to get sort to perform the following
A  1
B  1
A  2

0
 

Author Comment

by:dlnewman70
ID: 34199693
jmatix, your one line code seems to work. Could you possibly explain the code briefly? I appreciate the fast response.
0
 
LVL 16

Expert Comment

by:jmatix
ID: 34199792
Basically it read all lines into an array @l. Then sorts the lines on the second field as key and prints the sorted lines.

{(split(/\t+/, $a))[1] cmp (split(/\t+/, $b))[1]}

The above code splits the line at tab character and compares the second fields (subscript [1]) of each line. If you want to sort descending just interchange $a and $b as:

{(split(/\t+/, $b))[1] cmp (split(/\t+/, $a))[1]}

0
 

Author Closing Comment

by:dlnewman70
ID: 34199959
I really appreciate both experts who helped me with this problem. I split the points based upon the valuable input and the fact that both experts really helped get me pointed in the right direction. I gave jmatix the greater points because his solution seemed to work the best. Part of it was associated with the uniqueness of the fields.
0

Featured Post

On Demand Webinar: Networking for the Cloud Era

Ready to improve network connectivity? Watch this webinar to learn how SD-WANs and a one-click instant connect tool can boost provisions, deployment, and management of your cloud connection.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question