Solved

Sort values as text

Posted on 2013-11-28
15
259 Views
Last Modified: 2013-11-30
This code

#!/usr/bin/perl
use strict;
use warnings;
@ARGV or @ARGV=('sample.txt');
open STDOUT,">output.txt" or die "output.txt $!";
print reverse sort <>;

Works great. It sorts a list of data, with two columns, into descending order based on the first column.

The problem that I have now, is that I require need further sorting.

the sorted data looks like this:

60000 84
51000 1260
50100 1260
50010 1260
50001 1260
42000 5670
41100 12600
41010 12600
41001 12600
40200 5670
40110 12600
40101 12600
.....
00600 210
00510 2520
00501 2520
00420 9450
00411 21000
00402 9450
Descending order - everything good so far.

The problem is that there are groups within this first column.

All groups containing

6
5,1
4,2
4,1,1
3,3
3,2,1
3,1,1,1
2,2,2
2,2,1,1
2,1,1,1,1

should be grouped together, then put into descending order as before.

Could someone help me by adjusting the above code to perform this ?

thankyou
0
Comment
Question by:MichaelGlancy
  • 5
  • 5
  • 5
15 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 39683779
Could you clarify what the groups would be in the above example?
0
 

Author Comment

by:MichaelGlancy
ID: 39683796
6   - 60000 06000 00600 etc
5,1 51000 50100 50010 15000 etc
4,2 42000 04020 etc
4,1,1
3,3 33000 - 00033
3,2,1
3,1,1,1
2,2,2
2,2,1,1
2,1,1,1,1  21111 12111 1121 etc

Is that clearer ?
0
 
LVL 84

Expert Comment

by:ozo
ID: 39683803
What group would 123456 go into?
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 39683811
When you say "groups containing" do you mean the number contains each of those digits in the order specified (so 5,1 would match 5100 and 5010 but not 1500)?

You can't do this with a simple sort.  You'll have to use a custom sort function as I did on your previous question (though it will be more complex).
0
 

Author Comment

by:MichaelGlancy
ID: 39683815
there isnt a 123456

as per the original sample list, and the described groupings above,

only groups that contain these numbers and only these numbers plus a few zeros.

there should be a total of five digits, the numbers plus zeros.

ie

6   - 60000 06000 00600 etc      and all combinations of these
5,1 - 51000 50100 50010 15000 etc     and all combinations of these
4,2 - 42000 04020 etc and all     combinations of these
4,1,1- 41100       and all combinations of these
3,3 33000 - 00033   and all combinations of these
3,2,1-  32100
3,1,1,1 - 31110 and all combinations of these
2,2,2 - 22200
2,2,1,1 - 22110
2,1,1,1,1  21111 12111 1121 etc  

and so on

does that make it clearer ?
0
 

Author Comment

by:MichaelGlancy
ID: 39683817
When you say "groups containing" do you mean the number contains each of those digits in the order specified (so 5,1 would match 5100 and 5010 but not 1500)?


must be 5 digits, with zeros making up the rest

so 5,1  could be

51000
50100
50010
50001
15000
05100
05010
05001
10500
   
etc

same for all combination
0
 
LVL 84

Expert Comment

by:ozo
ID: 39683830
What group would 12456 go in?
0
6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

 
LVL 26

Expert Comment

by:wilcoxon
ID: 39683838
I hadn't seen your response about the rest 0s so this isn't perfect but it'll be close.
use strict;
use warnings;
@ARGV or @ARGV=('sample.txt');
open OUT, '>output.txt' or die "could not write output.txt: $!";
print OUT sort col1_rev <>;
sub col1_rev {
    # split the columns and only compare the first one
    my $cola = (split /\s+/, $a)[0];
    my $colb = (split /\s+/, $b)[0];
    my $grpa = group($cola);
    my $grpb = group($colb);
    return $grpa <=> $grpb unless ($grpa == $grpb);
    return $colb <=> $cola;
}
sub group {
    my ($val) = @_;
    return 1 if ($val =~ m{6});
    return 2 if ($val =~ m{5.*1|1.*5});
    return 3 if ($val =~ m{4.*2|2.*4});
    return 4 if ($val =~ m{1.*1} and $val =~ m{4});
    return 5 if ($val =~ m{3.*3});
    return 6 if ($val =~ m{3} and $val =~ m{2} and $val =~ m{1});
    return 7 if ($val =~ m{1.*1.*1} and $val =~ m{3});
    return 8 if ($val =~ m{2.*2.*2});
    return 9 if ($val =~ m{2.*2} and $val =~ m{1.*1});
    return 10 if ($val =~ m{1.*1.*1.*1} and $val =~ m{2});
}

Open in new window

0
 

Author Comment

by:MichaelGlancy
ID: 39683851
There isnt a 123456

These codes (in the first column) are counting the frequency of values in my list. The second column is the count.

there are five columns in this set of data, on each row, 1 2 3 4 and 5

a 0 means there are none of the values I am looking for in that column

I can only have a total of 6 values in each row, so if I find a group of 6 in one column, then there can be no values in any other column so I would have a pattern 6 0 0 0 0, or 0 0 6 0 0 depending on which column the values are found in. If I find a 5 in one column there can only be 1 incidence of another value that meets my search criteria in another column, so 5 in one column, 1 in another and the other three columns must be empty, and so on with 4 and 2, 4 and 1 and 1 etc
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 39683852
Revised for rest must be 0s...  Rerevised since each column can only contain total 6 (mostly the same as previous code).
use strict;
use warnings;
@ARGV or @ARGV=('sample.txt');
open OUT, '>output.txt' or die "could not write output.txt: $!";
print OUT sort col1_rev <>;
sub col1_rev {
    # split the columns and only compare the first one
    my $cola = (split /\s+/, $a)[0];
    my $colb = (split /\s+/, $b)[0];
    my $grpa = group($cola);
    my $grpb = group($colb);
    return $grpa <=> $grpb unless ($grpa == $grpb);
    return $colb <=> $cola;
}
sub group {
    my ($val) = @_;
    return 1 if ($val =~ m{6});
    return 2 if ($val =~ m{5.*1|1.*5});
    return 3 if ($val =~ m{4.*2|2.*4});
    return 4 if ($val =~ m{1.*1} and $val =~ m{4});
    return 5 if ($val =~ m{3.*3});
    return 6 if ($val =~ m{3} and $val =~ m{2} and $val =~ m{1});
    return 7 if ($val =~ m{1.*1.*1} and $val =~ m{3});
    return 8 if ($val =~ m{2.*2.*2});
    return 9 if ($val =~ m{2.*2} and $val =~ m{1.*1});
    return 10 if ($val =~ m{1.*1.*1.*1} and $val =~ m{2});
    die "$val does not match any established grouping";
}

Open in new window

0
 

Author Comment

by:MichaelGlancy
ID: 39683866
wilcoxon - output file has no content at all
0
 
LVL 84

Expert Comment

by:ozo
ID: 39683869
#!/usr/bin/perl
use strict;
use warnings;
@ARGV or @ARGV=('sample.txt');
my %groups;
unshift @{$groups{join'',sort split//,(split)[0]}},$_ for sort <>;
open STDOUT,">output.txt" or die "output.txt $!";
print @$_,"\n" for values %groups;
0
 
LVL 26

Assisted Solution

by:wilcoxon
wilcoxon earned 100 total points
ID: 39683875
An alternate way to do the group checking (probably more efficient than using regexes)...
use strict;
use warnings;
use List::Util qw(sum);
@ARGV or @ARGV=('sample.txt');
open OUT, '>output.txt' or die "could not write output.txt: $!";
print OUT sort col1_rev <>;
sub col1_rev {
    # split the columns and only compare the first one
    my $cola = (split /\s+/, $a)[0];
    my $colb = (split /\s+/, $b)[0];
    my $grpa = group($cola);
    my $grpb = group($colb);
    return $grpa <=> $grpb unless ($grpa == $grpb);
    return $colb <=> $cola;
}
sub group {
    my ($val) = @_;
    my @dig = sort { $b <=> $a } split //, $val;
    die "$val is invalid\n" unless (sum(@dig) == 6);
    return 1 if ($dig[0] == 6);
    return 2 if ($dig[0] == 5 and $dig[1] == 1);
    return 3 if ($dig[0] == 4 and $dig[1] == 2);
    return 4 if ($dig[0] == 4 and $dig[1] == 1 and $dig[2] == 1);
    return 5 if ($dig[0] == 3 and $dig[1] == 3);
    return 6 if ($dig[0] == 3 and $dig[1] == 2 and $dig[2] == 1);
    return 7 if ($dig[0] == 3 and $dig[1] == 1 and $dig[2] == 1 and $dig[3] == 1);
    return 8 if ($dig[0] == 2 and $dig[1] == 2 and $dig[3] == 2);
    return 9 if ($dig[0] == 2 and $dig[1] == 2 and $dig[3] == 1 and $dig[4] == 1);
    return 10 if ($dig[0] == 2 and $dig[1] == 1 and $dig[2] == 1 and $dig[3] == 1 and $dig[4] == 1);
}

Open in new window

0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 39683887
Tested both the regex version and the latest variant and they both produce the expected output in output.txt for me.  Is sample.txt empty or is there an error if you run it from the command-line (perl script.pl)?

I hadn't gone back and rethought the groupings (until looking at ozo's latest).  group sub could be simplified further (or removed and written differently) since all values should add up to 6 but it works quickly as-is.
0
 
LVL 84

Accepted Solution

by:
ozo earned 400 total points
ID: 39683977
If you want the groups ordered, in addition to sorting within the groups:
#!/usr/bin/perl
use strict;
use warnings;
@ARGV or @ARGV=('sample.txt');
my %groups;
unshift @{$groups{reverse sort split//,(split)[0]}},$_ for sort <>;
open STDOUT,">output.txt" or die "output.txt $!";
print @{$groups{$_}},"\n" for reverse sort keys %groups;

Another way to do it:
#!/usr/bin/perl
use strict;
use warnings;
@ARGV or @ARGV=('sample.txt');
my %groups;
push @{$groups{reverse sort split//,(split)[0]}},$_ while <>;
open STDOUT,">output.txt" or die "output.txt $!";
print reverse(sort @{$groups{$_}}),"\n" for reverse sort keys %groups;

If you don't need the separator between groups:
#!/usr/bin/perl
use strict;
use warnings;
@ARGV or @ARGV=('sample.txt');
open STDOUT,">output.txt" or die "output.txt $!";
print /\0(.*)/s for reverse sort map{join"",reverse("\0",sort split//,(split)[0]),$_} <>;
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now