Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Sorting sub routine

Posted on 2003-11-04
8
Medium Priority
?
441 Views
Last Modified: 2010-03-04
I have the following sub-routine:


#########################################################################
#                                                                                                            #
# subroutine sort_data                                                                              #
# Subroutine that does the actual sort.                                                 #
# Accepts 4 params viz.                                                                         #
# 1. The column number to sort. Column no. start from 0.                         #
# 2. The type of sort numeric or alphabetic. Default is Alphabetic.       #
# 3. The order of sort. Default order is ascending                                    #
# 4. The referrence of the array that needs to be sorted.                         #
#########################################################################

sub sort_data {
                  
      my ($row,$type,$sort_order,$r_data) = @_;
      my (@array);
      
      if ($type eq "n") {
            if ($sort_order eq "d") {
                  @array = map { (split ('<->', $_))[1] }
                        reverse sort {$a <=> $b}
                        map { join ('<->', lc ((split('\|', $_))[$row]) , $_) }
                        @{$r_data};
            }
            else {
                  @array = map { (split ('<->', $_))[1] }
                        sort {$a <=> $b}
                        map { join ('<->', lc ((split('\|', $_))[$row]) , $_) }
                        @{$r_data};
            }
      }
      else {
            if ($sort_order eq "d") {
                  @array = map { (split ('<->', $_))[1] }
                        reverse sort {$a cmp $b}
                        map { join ('<->', lc ((split('\|', $_))[$row]) , $_) }
                        @{$r_data};
            }
            else {
                  @array = map { (split ('<->', $_))[1] }
                        sort {$a cmp $b}
                        map { join ('<->', lc ((split('\|', $_))[$row]) , $_) }
                        @{$r_data};
            }
      }

      return (\@array)
}

Someone rewrote it for me to make it more efficient and to make the warning that did showup in the error log file the moment I added the -W option to the perl command.

#########################################################################
#                                                                                                            #
# subroutine sort_data                                                                              #
# Subroutine that does the actual sort.                                                 #
# Accepts 4 params viz.                                                                         #
# 1. The column number to sort. Column no. start from 0.                         #
# 2. The type of sort numeric or alphabetic. Default is Alphabetic.       #
# 3. The order of sort. Default order is ascending                                    #
# 4. The referrence of the array that needs to be sorted.                         #
#########################################################################

sub sort_data {
                  
      my ($row,$type,$sort_order,$r_data) = @_;
      my (@array);
      
      if ($type eq "n") {
            if ($sort_order eq "d") {
                  @array = map { $_->[1] }
                        sort {$b->[0] <=> $a->[0]}
                        map { [ lc ((split('|', $_))[$row]) , $_] }
                        @{$r_data};
            }
            else {
                  @array = map { $_->[1] }
                        sort {$a->[0] <=> $b->[0]}
                        map { [ lc ((split('|', $_))[$row]) , $_] }
                        @{$r_data};
            }
      }
      else {
            if ($sort_order eq "d") {
                  @array = map { $_->[1] }
                        sort {$b->[0] cmp $a->[0]}
                        map { [ lc ((split('|', $_))[$row]) , $_] }
                        @{$r_data};
            }
            else {
                  @array = map { $_->[1] }
                        sort {$a->[0] cmp $b->[0]}
                        map { [ lc ((split('|', $_))[$row]) , $_] }
                        @{$r_data};
            }
      }

      return (\@array)
}

But it isn't working like the old version is! It isn't the same. What is different?

Forexample:

$row = 0;
$type = "n";
$sort_order = "a";

will work in the old version but not in the new version. Trying to learn perl by learning what is wrong and why. Can someone help me to fix the new code?
0
Comment
Question by:weversbv
  • 3
  • 2
  • 2
  • +1
8 Comments
 
LVL 85

Expert Comment

by:ozo
ID: 9681758
 split('|', $_)
should be
  split('\|', $_)
0
 
LVL 18

Expert Comment

by:kandura
ID: 9681902
It doesn't work with numerical sorting, because the Schwartzian Transform uses '|' to split the rows,
but this is interpreted as an 'or' regexp. The first element in the anonymous array will always contain
only the $row-th character in the data.
I'm pretty sure your alphabetical sort would show the same problem!

You could write the whole routine a lot prettier by using the ? : operator to select two alternatives,
and by realizing that <=> and cmp return 1,0,-1 to tell how the two elements compare.

So I would write this:

sub sort_data
{
      my ($row,$type,$sort_order,$r_data) = @_;

      # $order will contain 1 for ascending and -1 for descending sort
      my $order = $sort_order eq 'd' ? -1 : 1;                    
      
      my @array = map { $_->[1] }
            sort { ( $type eq 'n' ?                              # is it a numerical sort?
                        $a->[0] <=> $b->[0] :                  # then use <=> operator
                        $a->[0] cmp $b->[0] )                  # else do lexical sort
                          * $order                        # multiply by $order to change the direction
                                                        # of the sort
                  }  map {
                        [ lc ((split('\|', $_))[$row]) , $_]      # use a Schwartzian Transform on the input.
                  } @$r_data;
      
      return \@array;
}
0
 
LVL 20

Expert Comment

by:jmcg
ID: 9684783
One good thing about the two versions of your code is that they keep the sort comparison routine short and simple. Kandura undoes much of the virtue of the Schwartzian transform by calculating too much in the compare routine.

Ozo has spotted the essential flaw in the rewriting. Although split will take strings as its first argument, the only string that is supposed to be given to it is one containing a single space. Other than that, you should be passing a regex to match. So all the places where your rewrite has

   split('|', $_)

you should have

   split( /\|/, $_)

or, just a slight performance improvement, since you only need to split enough times to get to the field (why are you calling it a "row"???) of interest.

  split( /\|/, $_, $row+2)

In a similar vein, you could forego the call to the 'lc' operator when doing numeric searches.

0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 

Author Comment

by:weversbv
ID: 9684967
jmcg:

or, just a slight performance improvement, since you only need to split enough times to get to the field (why are you calling it a "row"???) of interest.

  split( /\|/, $_, $row+2)

In a similar vein, you could forego the call to the 'lc' operator when doing numeric searches.

???

Could you help me to an example of what you mean?
0
 
LVL 18

Expert Comment

by:kandura
ID: 9685327
jmcg: you are right. In an attempt to write prettier code, I sacrified performance. I didn't think 1 'if' and 1 multiplication would matter much, but of course it adds up.

weversbv: what jmcg means is that:
1.  you don't have to split every row completely, because you only want the one column. So there's no need to split the remaining columns after that;
2.  you don't have to convert the column you're looking for to lowercase when you're sorting numerically.

To put it together, here's your new routine:

sub sort_data {
               
     my ($row,$type,$sort_order,$r_data) = @_;
     my (@array);
     
     if ($type eq "n") {
          if ($sort_order eq "d") {
               @array = map { $_->[1] }
                    sort {$b->[0] <=> $a->[0]}
                    map { [ (split(/\|/, $_, $row+2))[$row] , $_] }          #  <-- leave out the call to lc( )
                    @{$r_data};
          }
          else {
               @array = map { $_->[1] }
                    sort {$a->[0] <=> $b->[0]}
                    map { [ (split(/\|/, $_, $row+2))[$row] , $_] }          #  <-- same here
                    @{$r_data};
          }
     }
     else {
          if ($sort_order eq "d") {
               @array = map { $_->[1] }
                    sort {$b->[0] cmp $a->[0]}
                    map { [ lc ((split(/\|/, $_, $row+2))[$row]) , $_] }
                    @{$r_data};
          }
          else {
               @array = map { $_->[1] }
                    sort {$a->[0] cmp $b->[0]}
                    map { [ lc ((split(/\|/, $_, $row+2))[$row]) , $_] }
                    @{$r_data};
          }
     }

     return (\@array)
}
0
 

Author Comment

by:weversbv
ID: 9685606
I do know understand what you mean.

Is it possible to sort nummerical whenever it is possible.
So if the row to sort contains only nummerical values I woudl like to sort nummerical.
If not sort alphabetic. Is this possible?
0
 
LVL 20

Expert Comment

by:jmcg
ID: 9688430
If your array isn't very large (how large is that? I'm not sure: 1000? 5000?), you can combine numerical and lexical sort this way:

    sort { ($a->[0] <=> $b->[0]) || ($a->[0] cmp $b->[0]) } ...

As soon as you start to do something which requires examining each term and making decisions in the comparison routine, you slow things down. As the array being sorted gets larger, it becomes increasingly important to keep the comparison routine simple. And, eventually, you may need to use an external sort program.

For a combined sort such as you are asking for, on something sufficiently large, it may be worthwhile to convert the numbers into zero-filled fields large enough to represent the largest number present. Zero-filled fields, ie, 000123 instead of just 123, allow you to use a lexical sort for numbers.
0
 
LVL 18

Accepted Solution

by:
kandura earned 500 total points
ID: 9688498
Yes, that's possible. It means you have to investigate the data in the column before sorting.
That is, loop through the requested column. If you encounter a non-numerical value, do lexical sort, otherwise it'll be a numerical sort.

sub sort_data_check {
      my ($row,$type,$sort_order,$r_data) = @_;
      my @array = ();
      my $sort_type =  'n';      # I use this to get the sort type from the data

        # here I loop through the data, extracting the requested column
        # and investigating its type
        # note that this takes care of the first step in the Schwartzian Transform
      foreach (@$r_data)
      {
            my $k = (split(/\|/, $_, $row+2))[$row];     # this gets your column
            if($k==0 && $k ne '0') {
                  $sort_type = 'a';                                # it's not a number
            }
            push @array, [$k, $_];
      }

      if($sort_type ne 'n')                                               # convert the data to lower case
      {
            @array = map { lc($_->[0]); $_ } @array;
      }

      if ($sort_type eq "n") {
            if ($sort_order eq "d") {
                  @array = map { $_->[1] }
                        sort {$b->[0] <=> $a->[0]}
                        @array;
            } else {
                  @array = map { $_->[1] }
                        sort {$a->[0] <=> $b->[0]}
                        @array;
            }
      } else {
            if ($sort_order eq "d") {
                  @array = map { $_->[1] }
                        sort {$b->[0] cmp $a->[0]}
                                @array;
            } else {
                  @array = map { $_->[1] }
                        sort {$a->[0] cmp $b->[0]}
                        @array;
            }
      }

      return (\@array)
}



 
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

926 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question