Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
Solved

Sort values as text

Posted on 2013-11-28
Medium Priority
287 Views
This code

#!/usr/bin/perl
use strict;
use warnings;
@ARGV or @ARGV=('sample.txt');
open STDOUT,">output.txt" or die "output.txt \$!";
print reverse sort <>;

Works great. It sorts a list of data, with two columns, into descending order based on the first column.

The problem that I have now, is that I require need further sorting.

the sorted data looks like this:

60000 84
51000 1260
50100 1260
50010 1260
50001 1260
42000 5670
41100 12600
41010 12600
41001 12600
40200 5670
40110 12600
40101 12600
.....
00600 210
00510 2520
00501 2520
00420 9450
00411 21000
00402 9450
Descending order - everything good so far.

The problem is that there are groups within this first column.

All groups containing

6
5,1
4,2
4,1,1
3,3
3,2,1
3,1,1,1
2,2,2
2,2,1,1
2,1,1,1,1

should be grouped together, then put into descending order as before.

Could someone help me by adjusting the above code to perform this ?

thankyou
0
Question by:Europa MacDonald
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

• Help others & share knowledge
• Earn cash & points
• Learn & ask questions
• 5
• 5
• 5

LVL 84

Expert Comment

ID: 39683779
Could you clarify what the groups would be in the above example?
0

Author Comment

ID: 39683796
6   - 60000 06000 00600 etc
5,1 51000 50100 50010 15000 etc
4,2 42000 04020 etc
4,1,1
3,3 33000 - 00033
3,2,1
3,1,1,1
2,2,2
2,2,1,1
2,1,1,1,1  21111 12111 1121 etc

Is that clearer ?
0

LVL 84

Expert Comment

ID: 39683803
What group would 123456 go into?
0

LVL 27

Expert Comment

ID: 39683811
When you say "groups containing" do you mean the number contains each of those digits in the order specified (so 5,1 would match 5100 and 5010 but not 1500)?

You can't do this with a simple sort.  You'll have to use a custom sort function as I did on your previous question (though it will be more complex).
0

Author Comment

ID: 39683815
there isnt a 123456

as per the original sample list, and the described groupings above,

only groups that contain these numbers and only these numbers plus a few zeros.

there should be a total of five digits, the numbers plus zeros.

ie

6   - 60000 06000 00600 etc      and all combinations of these
5,1 - 51000 50100 50010 15000 etc     and all combinations of these
4,2 - 42000 04020 etc and all     combinations of these
4,1,1- 41100       and all combinations of these
3,3 33000 - 00033   and all combinations of these
3,2,1-  32100
3,1,1,1 - 31110 and all combinations of these
2,2,2 - 22200
2,2,1,1 - 22110
2,1,1,1,1  21111 12111 1121 etc

and so on

does that make it clearer ?
0

Author Comment

ID: 39683817
When you say "groups containing" do you mean the number contains each of those digits in the order specified (so 5,1 would match 5100 and 5010 but not 1500)?

must be 5 digits, with zeros making up the rest

so 5,1  could be

51000
50100
50010
50001
15000
05100
05010
05001
10500

etc

same for all combination
0

LVL 84

Expert Comment

ID: 39683830
What group would 12456 go in?
0

LVL 27

Expert Comment

ID: 39683838
I hadn't seen your response about the rest 0s so this isn't perfect but it'll be close.
``````use strict;
use warnings;
@ARGV or @ARGV=('sample.txt');
open OUT, '>output.txt' or die "could not write output.txt: \$!";
print OUT sort col1_rev <>;
sub col1_rev {
# split the columns and only compare the first one
my \$cola = (split /\s+/, \$a)[0];
my \$colb = (split /\s+/, \$b)[0];
my \$grpa = group(\$cola);
my \$grpb = group(\$colb);
return \$grpa <=> \$grpb unless (\$grpa == \$grpb);
return \$colb <=> \$cola;
}
sub group {
my (\$val) = @_;
return 1 if (\$val =~ m{6});
return 2 if (\$val =~ m{5.*1|1.*5});
return 3 if (\$val =~ m{4.*2|2.*4});
return 4 if (\$val =~ m{1.*1} and \$val =~ m{4});
return 5 if (\$val =~ m{3.*3});
return 6 if (\$val =~ m{3} and \$val =~ m{2} and \$val =~ m{1});
return 7 if (\$val =~ m{1.*1.*1} and \$val =~ m{3});
return 8 if (\$val =~ m{2.*2.*2});
return 9 if (\$val =~ m{2.*2} and \$val =~ m{1.*1});
return 10 if (\$val =~ m{1.*1.*1.*1} and \$val =~ m{2});
}
``````
0

Author Comment

ID: 39683851
There isnt a 123456

These codes (in the first column) are counting the frequency of values in my list. The second column is the count.

there are five columns in this set of data, on each row, 1 2 3 4 and 5

a 0 means there are none of the values I am looking for in that column

I can only have a total of 6 values in each row, so if I find a group of 6 in one column, then there can be no values in any other column so I would have a pattern 6 0 0 0 0, or 0 0 6 0 0 depending on which column the values are found in. If I find a 5 in one column there can only be 1 incidence of another value that meets my search criteria in another column, so 5 in one column, 1 in another and the other three columns must be empty, and so on with 4 and 2, 4 and 1 and 1 etc
0

LVL 27

Expert Comment

ID: 39683852
Revised for rest must be 0s...  Rerevised since each column can only contain total 6 (mostly the same as previous code).
``````use strict;
use warnings;
@ARGV or @ARGV=('sample.txt');
open OUT, '>output.txt' or die "could not write output.txt: \$!";
print OUT sort col1_rev <>;
sub col1_rev {
# split the columns and only compare the first one
my \$cola = (split /\s+/, \$a)[0];
my \$colb = (split /\s+/, \$b)[0];
my \$grpa = group(\$cola);
my \$grpb = group(\$colb);
return \$grpa <=> \$grpb unless (\$grpa == \$grpb);
return \$colb <=> \$cola;
}
sub group {
my (\$val) = @_;
return 1 if (\$val =~ m{6});
return 2 if (\$val =~ m{5.*1|1.*5});
return 3 if (\$val =~ m{4.*2|2.*4});
return 4 if (\$val =~ m{1.*1} and \$val =~ m{4});
return 5 if (\$val =~ m{3.*3});
return 6 if (\$val =~ m{3} and \$val =~ m{2} and \$val =~ m{1});
return 7 if (\$val =~ m{1.*1.*1} and \$val =~ m{3});
return 8 if (\$val =~ m{2.*2.*2});
return 9 if (\$val =~ m{2.*2} and \$val =~ m{1.*1});
return 10 if (\$val =~ m{1.*1.*1.*1} and \$val =~ m{2});
die "\$val does not match any established grouping";
}
``````
0

Author Comment

ID: 39683866
wilcoxon - output file has no content at all
0

LVL 84

Expert Comment

ID: 39683869
#!/usr/bin/perl
use strict;
use warnings;
@ARGV or @ARGV=('sample.txt');
my %groups;
unshift @{\$groups{join'',sort split//,(split)[0]}},\$_ for sort <>;
open STDOUT,">output.txt" or die "output.txt \$!";
print @\$_,"\n" for values %groups;
0

LVL 27

Assisted Solution

wilcoxon earned 400 total points
ID: 39683875
An alternate way to do the group checking (probably more efficient than using regexes)...
``````use strict;
use warnings;
use List::Util qw(sum);
@ARGV or @ARGV=('sample.txt');
open OUT, '>output.txt' or die "could not write output.txt: \$!";
print OUT sort col1_rev <>;
sub col1_rev {
# split the columns and only compare the first one
my \$cola = (split /\s+/, \$a)[0];
my \$colb = (split /\s+/, \$b)[0];
my \$grpa = group(\$cola);
my \$grpb = group(\$colb);
return \$grpa <=> \$grpb unless (\$grpa == \$grpb);
return \$colb <=> \$cola;
}
sub group {
my (\$val) = @_;
my @dig = sort { \$b <=> \$a } split //, \$val;
die "\$val is invalid\n" unless (sum(@dig) == 6);
return 1 if (\$dig[0] == 6);
return 2 if (\$dig[0] == 5 and \$dig[1] == 1);
return 3 if (\$dig[0] == 4 and \$dig[1] == 2);
return 4 if (\$dig[0] == 4 and \$dig[1] == 1 and \$dig[2] == 1);
return 5 if (\$dig[0] == 3 and \$dig[1] == 3);
return 6 if (\$dig[0] == 3 and \$dig[1] == 2 and \$dig[2] == 1);
return 7 if (\$dig[0] == 3 and \$dig[1] == 1 and \$dig[2] == 1 and \$dig[3] == 1);
return 8 if (\$dig[0] == 2 and \$dig[1] == 2 and \$dig[3] == 2);
return 9 if (\$dig[0] == 2 and \$dig[1] == 2 and \$dig[3] == 1 and \$dig[4] == 1);
return 10 if (\$dig[0] == 2 and \$dig[1] == 1 and \$dig[2] == 1 and \$dig[3] == 1 and \$dig[4] == 1);
}
``````
0

LVL 27

Expert Comment

ID: 39683887
Tested both the regex version and the latest variant and they both produce the expected output in output.txt for me.  Is sample.txt empty or is there an error if you run it from the command-line (perl script.pl)?

I hadn't gone back and rethought the groupings (until looking at ozo's latest).  group sub could be simplified further (or removed and written differently) since all values should add up to 6 but it works quickly as-is.
0

LVL 84

Accepted Solution

ozo earned 1600 total points
ID: 39683977
If you want the groups ordered, in addition to sorting within the groups:
#!/usr/bin/perl
use strict;
use warnings;
@ARGV or @ARGV=('sample.txt');
my %groups;
unshift @{\$groups{reverse sort split//,(split)[0]}},\$_ for sort <>;
open STDOUT,">output.txt" or die "output.txt \$!";
print @{\$groups{\$_}},"\n" for reverse sort keys %groups;

Another way to do it:
#!/usr/bin/perl
use strict;
use warnings;
@ARGV or @ARGV=('sample.txt');
my %groups;
push @{\$groups{reverse sort split//,(split)[0]}},\$_ while <>;
open STDOUT,">output.txt" or die "output.txt \$!";
print reverse(sort @{\$groups{\$_}}),"\n" for reverse sort keys %groups;

If you don't need the separator between groups:
#!/usr/bin/perl
use strict;
use warnings;
@ARGV or @ARGV=('sample.txt');
open STDOUT,">output.txt" or die "output.txt \$!";
print /\0(.*)/s for reverse sort map{join"",reverse("\0",sort split//,(split)[0]),\$_} <>;
0

Featured Post

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08â€¦
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: Tâ€¦
Six Sigma Control Plans
Suggested Courses
Course of the Month4 days, 10 hours left to enroll

661 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.