Sort values as text

This code

#!/usr/bin/perl
use strict;
use warnings;
@ARGV or @ARGV=('sample.txt');
open STDOUT,">output.txt" or die "output.txt $!";
print reverse sort <>;

Works great. It sorts a list of data, with two columns, into descending order based on the first column.

The problem that I have now, is that I require need further sorting.

the sorted data looks like this:

60000 84
51000 1260
50100 1260
50010 1260
50001 1260
42000 5670
41100 12600
41010 12600
41001 12600
40200 5670
40110 12600
40101 12600
.....
00600 210
00510 2520
00501 2520
00420 9450
00411 21000
00402 9450
Descending order - everything good so far.

The problem is that there are groups within this first column.

All groups containing

6
5,1
4,2
4,1,1
3,3
3,2,1
3,1,1,1
2,2,2
2,2,1,1
2,1,1,1,1

should be grouped together, then put into descending order as before.

Could someone help me by adjusting the above code to perform this ?

thankyou
Europa MacDonaldChief slayer of dragonsAsked:
Who is Participating?

Improve company productivity with a Business Account.Sign Up

x
 
ozoConnect With a Mentor Commented:
If you want the groups ordered, in addition to sorting within the groups:
#!/usr/bin/perl
use strict;
use warnings;
@ARGV or @ARGV=('sample.txt');
my %groups;
unshift @{$groups{reverse sort split//,(split)[0]}},$_ for sort <>;
open STDOUT,">output.txt" or die "output.txt $!";
print @{$groups{$_}},"\n" for reverse sort keys %groups;

Another way to do it:
#!/usr/bin/perl
use strict;
use warnings;
@ARGV or @ARGV=('sample.txt');
my %groups;
push @{$groups{reverse sort split//,(split)[0]}},$_ while <>;
open STDOUT,">output.txt" or die "output.txt $!";
print reverse(sort @{$groups{$_}}),"\n" for reverse sort keys %groups;

If you don't need the separator between groups:
#!/usr/bin/perl
use strict;
use warnings;
@ARGV or @ARGV=('sample.txt');
open STDOUT,">output.txt" or die "output.txt $!";
print /\0(.*)/s for reverse sort map{join"",reverse("\0",sort split//,(split)[0]),$_} <>;
0
 
ozoCommented:
Could you clarify what the groups would be in the above example?
0
 
Europa MacDonaldChief slayer of dragonsAuthor Commented:
6   - 60000 06000 00600 etc
5,1 51000 50100 50010 15000 etc
4,2 42000 04020 etc
4,1,1
3,3 33000 - 00033
3,2,1
3,1,1,1
2,2,2
2,2,1,1
2,1,1,1,1  21111 12111 1121 etc

Is that clearer ?
0
Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
ozoCommented:
What group would 123456 go into?
0
 
wilcoxonCommented:
When you say "groups containing" do you mean the number contains each of those digits in the order specified (so 5,1 would match 5100 and 5010 but not 1500)?

You can't do this with a simple sort.  You'll have to use a custom sort function as I did on your previous question (though it will be more complex).
0
 
Europa MacDonaldChief slayer of dragonsAuthor Commented:
there isnt a 123456

as per the original sample list, and the described groupings above,

only groups that contain these numbers and only these numbers plus a few zeros.

there should be a total of five digits, the numbers plus zeros.

ie

6   - 60000 06000 00600 etc      and all combinations of these
5,1 - 51000 50100 50010 15000 etc     and all combinations of these
4,2 - 42000 04020 etc and all     combinations of these
4,1,1- 41100       and all combinations of these
3,3 33000 - 00033   and all combinations of these
3,2,1-  32100
3,1,1,1 - 31110 and all combinations of these
2,2,2 - 22200
2,2,1,1 - 22110
2,1,1,1,1  21111 12111 1121 etc  

and so on

does that make it clearer ?
0
 
Europa MacDonaldChief slayer of dragonsAuthor Commented:
When you say "groups containing" do you mean the number contains each of those digits in the order specified (so 5,1 would match 5100 and 5010 but not 1500)?


must be 5 digits, with zeros making up the rest

so 5,1  could be

51000
50100
50010
50001
15000
05100
05010
05001
10500
   
etc

same for all combination
0
 
ozoCommented:
What group would 12456 go in?
0
 
wilcoxonCommented:
I hadn't seen your response about the rest 0s so this isn't perfect but it'll be close.
use strict;
use warnings;
@ARGV or @ARGV=('sample.txt');
open OUT, '>output.txt' or die "could not write output.txt: $!";
print OUT sort col1_rev <>;
sub col1_rev {
    # split the columns and only compare the first one
    my $cola = (split /\s+/, $a)[0];
    my $colb = (split /\s+/, $b)[0];
    my $grpa = group($cola);
    my $grpb = group($colb);
    return $grpa <=> $grpb unless ($grpa == $grpb);
    return $colb <=> $cola;
}
sub group {
    my ($val) = @_;
    return 1 if ($val =~ m{6});
    return 2 if ($val =~ m{5.*1|1.*5});
    return 3 if ($val =~ m{4.*2|2.*4});
    return 4 if ($val =~ m{1.*1} and $val =~ m{4});
    return 5 if ($val =~ m{3.*3});
    return 6 if ($val =~ m{3} and $val =~ m{2} and $val =~ m{1});
    return 7 if ($val =~ m{1.*1.*1} and $val =~ m{3});
    return 8 if ($val =~ m{2.*2.*2});
    return 9 if ($val =~ m{2.*2} and $val =~ m{1.*1});
    return 10 if ($val =~ m{1.*1.*1.*1} and $val =~ m{2});
}

Open in new window

0
 
Europa MacDonaldChief slayer of dragonsAuthor Commented:
There isnt a 123456

These codes (in the first column) are counting the frequency of values in my list. The second column is the count.

there are five columns in this set of data, on each row, 1 2 3 4 and 5

a 0 means there are none of the values I am looking for in that column

I can only have a total of 6 values in each row, so if I find a group of 6 in one column, then there can be no values in any other column so I would have a pattern 6 0 0 0 0, or 0 0 6 0 0 depending on which column the values are found in. If I find a 5 in one column there can only be 1 incidence of another value that meets my search criteria in another column, so 5 in one column, 1 in another and the other three columns must be empty, and so on with 4 and 2, 4 and 1 and 1 etc
0
 
wilcoxonCommented:
Revised for rest must be 0s...  Rerevised since each column can only contain total 6 (mostly the same as previous code).
use strict;
use warnings;
@ARGV or @ARGV=('sample.txt');
open OUT, '>output.txt' or die "could not write output.txt: $!";
print OUT sort col1_rev <>;
sub col1_rev {
    # split the columns and only compare the first one
    my $cola = (split /\s+/, $a)[0];
    my $colb = (split /\s+/, $b)[0];
    my $grpa = group($cola);
    my $grpb = group($colb);
    return $grpa <=> $grpb unless ($grpa == $grpb);
    return $colb <=> $cola;
}
sub group {
    my ($val) = @_;
    return 1 if ($val =~ m{6});
    return 2 if ($val =~ m{5.*1|1.*5});
    return 3 if ($val =~ m{4.*2|2.*4});
    return 4 if ($val =~ m{1.*1} and $val =~ m{4});
    return 5 if ($val =~ m{3.*3});
    return 6 if ($val =~ m{3} and $val =~ m{2} and $val =~ m{1});
    return 7 if ($val =~ m{1.*1.*1} and $val =~ m{3});
    return 8 if ($val =~ m{2.*2.*2});
    return 9 if ($val =~ m{2.*2} and $val =~ m{1.*1});
    return 10 if ($val =~ m{1.*1.*1.*1} and $val =~ m{2});
    die "$val does not match any established grouping";
}

Open in new window

0
 
Europa MacDonaldChief slayer of dragonsAuthor Commented:
wilcoxon - output file has no content at all
0
 
ozoCommented:
#!/usr/bin/perl
use strict;
use warnings;
@ARGV or @ARGV=('sample.txt');
my %groups;
unshift @{$groups{join'',sort split//,(split)[0]}},$_ for sort <>;
open STDOUT,">output.txt" or die "output.txt $!";
print @$_,"\n" for values %groups;
0
 
wilcoxonConnect With a Mentor Commented:
An alternate way to do the group checking (probably more efficient than using regexes)...
use strict;
use warnings;
use List::Util qw(sum);
@ARGV or @ARGV=('sample.txt');
open OUT, '>output.txt' or die "could not write output.txt: $!";
print OUT sort col1_rev <>;
sub col1_rev {
    # split the columns and only compare the first one
    my $cola = (split /\s+/, $a)[0];
    my $colb = (split /\s+/, $b)[0];
    my $grpa = group($cola);
    my $grpb = group($colb);
    return $grpa <=> $grpb unless ($grpa == $grpb);
    return $colb <=> $cola;
}
sub group {
    my ($val) = @_;
    my @dig = sort { $b <=> $a } split //, $val;
    die "$val is invalid\n" unless (sum(@dig) == 6);
    return 1 if ($dig[0] == 6);
    return 2 if ($dig[0] == 5 and $dig[1] == 1);
    return 3 if ($dig[0] == 4 and $dig[1] == 2);
    return 4 if ($dig[0] == 4 and $dig[1] == 1 and $dig[2] == 1);
    return 5 if ($dig[0] == 3 and $dig[1] == 3);
    return 6 if ($dig[0] == 3 and $dig[1] == 2 and $dig[2] == 1);
    return 7 if ($dig[0] == 3 and $dig[1] == 1 and $dig[2] == 1 and $dig[3] == 1);
    return 8 if ($dig[0] == 2 and $dig[1] == 2 and $dig[3] == 2);
    return 9 if ($dig[0] == 2 and $dig[1] == 2 and $dig[3] == 1 and $dig[4] == 1);
    return 10 if ($dig[0] == 2 and $dig[1] == 1 and $dig[2] == 1 and $dig[3] == 1 and $dig[4] == 1);
}

Open in new window

0
 
wilcoxonCommented:
Tested both the regex version and the latest variant and they both produce the expected output in output.txt for me.  Is sample.txt empty or is there an error if you run it from the command-line (perl script.pl)?

I hadn't gone back and rethought the groupings (until looking at ozo's latest).  group sub could be simplified further (or removed and written differently) since all values should add up to 6 but it works quickly as-is.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.