Solved

Sorting sub routine

Posted on 2003-11-04
8
432 Views
Last Modified: 2010-03-04
I have the following sub-routine:


#########################################################################
#                                                                                                            #
# subroutine sort_data                                                                              #
# Subroutine that does the actual sort.                                                 #
# Accepts 4 params viz.                                                                         #
# 1. The column number to sort. Column no. start from 0.                         #
# 2. The type of sort numeric or alphabetic. Default is Alphabetic.       #
# 3. The order of sort. Default order is ascending                                    #
# 4. The referrence of the array that needs to be sorted.                         #
#########################################################################

sub sort_data {
                  
      my ($row,$type,$sort_order,$r_data) = @_;
      my (@array);
      
      if ($type eq "n") {
            if ($sort_order eq "d") {
                  @array = map { (split ('<->', $_))[1] }
                        reverse sort {$a <=> $b}
                        map { join ('<->', lc ((split('\|', $_))[$row]) , $_) }
                        @{$r_data};
            }
            else {
                  @array = map { (split ('<->', $_))[1] }
                        sort {$a <=> $b}
                        map { join ('<->', lc ((split('\|', $_))[$row]) , $_) }
                        @{$r_data};
            }
      }
      else {
            if ($sort_order eq "d") {
                  @array = map { (split ('<->', $_))[1] }
                        reverse sort {$a cmp $b}
                        map { join ('<->', lc ((split('\|', $_))[$row]) , $_) }
                        @{$r_data};
            }
            else {
                  @array = map { (split ('<->', $_))[1] }
                        sort {$a cmp $b}
                        map { join ('<->', lc ((split('\|', $_))[$row]) , $_) }
                        @{$r_data};
            }
      }

      return (\@array)
}

Someone rewrote it for me to make it more efficient and to make the warning that did showup in the error log file the moment I added the -W option to the perl command.

#########################################################################
#                                                                                                            #
# subroutine sort_data                                                                              #
# Subroutine that does the actual sort.                                                 #
# Accepts 4 params viz.                                                                         #
# 1. The column number to sort. Column no. start from 0.                         #
# 2. The type of sort numeric or alphabetic. Default is Alphabetic.       #
# 3. The order of sort. Default order is ascending                                    #
# 4. The referrence of the array that needs to be sorted.                         #
#########################################################################

sub sort_data {
                  
      my ($row,$type,$sort_order,$r_data) = @_;
      my (@array);
      
      if ($type eq "n") {
            if ($sort_order eq "d") {
                  @array = map { $_->[1] }
                        sort {$b->[0] <=> $a->[0]}
                        map { [ lc ((split('|', $_))[$row]) , $_] }
                        @{$r_data};
            }
            else {
                  @array = map { $_->[1] }
                        sort {$a->[0] <=> $b->[0]}
                        map { [ lc ((split('|', $_))[$row]) , $_] }
                        @{$r_data};
            }
      }
      else {
            if ($sort_order eq "d") {
                  @array = map { $_->[1] }
                        sort {$b->[0] cmp $a->[0]}
                        map { [ lc ((split('|', $_))[$row]) , $_] }
                        @{$r_data};
            }
            else {
                  @array = map { $_->[1] }
                        sort {$a->[0] cmp $b->[0]}
                        map { [ lc ((split('|', $_))[$row]) , $_] }
                        @{$r_data};
            }
      }

      return (\@array)
}

But it isn't working like the old version is! It isn't the same. What is different?

Forexample:

$row = 0;
$type = "n";
$sort_order = "a";

will work in the old version but not in the new version. Trying to learn perl by learning what is wrong and why. Can someone help me to fix the new code?
0
Comment
Question by:weversbv
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
  • 2
  • +1
8 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 9681758
 split('|', $_)
should be
  split('\|', $_)
0
 
LVL 18

Expert Comment

by:kandura
ID: 9681902
It doesn't work with numerical sorting, because the Schwartzian Transform uses '|' to split the rows,
but this is interpreted as an 'or' regexp. The first element in the anonymous array will always contain
only the $row-th character in the data.
I'm pretty sure your alphabetical sort would show the same problem!

You could write the whole routine a lot prettier by using the ? : operator to select two alternatives,
and by realizing that <=> and cmp return 1,0,-1 to tell how the two elements compare.

So I would write this:

sub sort_data
{
      my ($row,$type,$sort_order,$r_data) = @_;

      # $order will contain 1 for ascending and -1 for descending sort
      my $order = $sort_order eq 'd' ? -1 : 1;                    
      
      my @array = map { $_->[1] }
            sort { ( $type eq 'n' ?                              # is it a numerical sort?
                        $a->[0] <=> $b->[0] :                  # then use <=> operator
                        $a->[0] cmp $b->[0] )                  # else do lexical sort
                          * $order                        # multiply by $order to change the direction
                                                        # of the sort
                  }  map {
                        [ lc ((split('\|', $_))[$row]) , $_]      # use a Schwartzian Transform on the input.
                  } @$r_data;
      
      return \@array;
}
0
 
LVL 20

Expert Comment

by:jmcg
ID: 9684783
One good thing about the two versions of your code is that they keep the sort comparison routine short and simple. Kandura undoes much of the virtue of the Schwartzian transform by calculating too much in the compare routine.

Ozo has spotted the essential flaw in the rewriting. Although split will take strings as its first argument, the only string that is supposed to be given to it is one containing a single space. Other than that, you should be passing a regex to match. So all the places where your rewrite has

   split('|', $_)

you should have

   split( /\|/, $_)

or, just a slight performance improvement, since you only need to split enough times to get to the field (why are you calling it a "row"???) of interest.

  split( /\|/, $_, $row+2)

In a similar vein, you could forego the call to the 'lc' operator when doing numeric searches.

0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:weversbv
ID: 9684967
jmcg:

or, just a slight performance improvement, since you only need to split enough times to get to the field (why are you calling it a "row"???) of interest.

  split( /\|/, $_, $row+2)

In a similar vein, you could forego the call to the 'lc' operator when doing numeric searches.

???

Could you help me to an example of what you mean?
0
 
LVL 18

Expert Comment

by:kandura
ID: 9685327
jmcg: you are right. In an attempt to write prettier code, I sacrified performance. I didn't think 1 'if' and 1 multiplication would matter much, but of course it adds up.

weversbv: what jmcg means is that:
1.  you don't have to split every row completely, because you only want the one column. So there's no need to split the remaining columns after that;
2.  you don't have to convert the column you're looking for to lowercase when you're sorting numerically.

To put it together, here's your new routine:

sub sort_data {
               
     my ($row,$type,$sort_order,$r_data) = @_;
     my (@array);
     
     if ($type eq "n") {
          if ($sort_order eq "d") {
               @array = map { $_->[1] }
                    sort {$b->[0] <=> $a->[0]}
                    map { [ (split(/\|/, $_, $row+2))[$row] , $_] }          #  <-- leave out the call to lc( )
                    @{$r_data};
          }
          else {
               @array = map { $_->[1] }
                    sort {$a->[0] <=> $b->[0]}
                    map { [ (split(/\|/, $_, $row+2))[$row] , $_] }          #  <-- same here
                    @{$r_data};
          }
     }
     else {
          if ($sort_order eq "d") {
               @array = map { $_->[1] }
                    sort {$b->[0] cmp $a->[0]}
                    map { [ lc ((split(/\|/, $_, $row+2))[$row]) , $_] }
                    @{$r_data};
          }
          else {
               @array = map { $_->[1] }
                    sort {$a->[0] cmp $b->[0]}
                    map { [ lc ((split(/\|/, $_, $row+2))[$row]) , $_] }
                    @{$r_data};
          }
     }

     return (\@array)
}
0
 

Author Comment

by:weversbv
ID: 9685606
I do know understand what you mean.

Is it possible to sort nummerical whenever it is possible.
So if the row to sort contains only nummerical values I woudl like to sort nummerical.
If not sort alphabetic. Is this possible?
0
 
LVL 20

Expert Comment

by:jmcg
ID: 9688430
If your array isn't very large (how large is that? I'm not sure: 1000? 5000?), you can combine numerical and lexical sort this way:

    sort { ($a->[0] <=> $b->[0]) || ($a->[0] cmp $b->[0]) } ...

As soon as you start to do something which requires examining each term and making decisions in the comparison routine, you slow things down. As the array being sorted gets larger, it becomes increasingly important to keep the comparison routine simple. And, eventually, you may need to use an external sort program.

For a combined sort such as you are asking for, on something sufficiently large, it may be worthwhile to convert the numbers into zero-filled fields large enough to represent the largest number present. Zero-filled fields, ie, 000123 instead of just 123, allow you to use a lexical sort for numbers.
0
 
LVL 18

Accepted Solution

by:
kandura earned 125 total points
ID: 9688498
Yes, that's possible. It means you have to investigate the data in the column before sorting.
That is, loop through the requested column. If you encounter a non-numerical value, do lexical sort, otherwise it'll be a numerical sort.

sub sort_data_check {
      my ($row,$type,$sort_order,$r_data) = @_;
      my @array = ();
      my $sort_type =  'n';      # I use this to get the sort type from the data

        # here I loop through the data, extracting the requested column
        # and investigating its type
        # note that this takes care of the first step in the Schwartzian Transform
      foreach (@$r_data)
      {
            my $k = (split(/\|/, $_, $row+2))[$row];     # this gets your column
            if($k==0 && $k ne '0') {
                  $sort_type = 'a';                                # it's not a number
            }
            push @array, [$k, $_];
      }

      if($sort_type ne 'n')                                               # convert the data to lower case
      {
            @array = map { lc($_->[0]); $_ } @array;
      }

      if ($sort_type eq "n") {
            if ($sort_order eq "d") {
                  @array = map { $_->[1] }
                        sort {$b->[0] <=> $a->[0]}
                        @array;
            } else {
                  @array = map { $_->[1] }
                        sort {$a->[0] <=> $b->[0]}
                        @array;
            }
      } else {
            if ($sort_order eq "d") {
                  @array = map { $_->[1] }
                        sort {$b->[0] cmp $a->[0]}
                                @array;
            } else {
                  @array = map { $_->[1] }
                        sort {$a->[0] cmp $b->[0]}
                        @array;
            }
      }

      return (\@array)
}



 
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question