sorting a flat file in Unix

I asked this question before but did not get an answer I could use.
I have a flat file that looks somewhat like this

Field A    Field B   Field C
--------   --------   --------
A123B    RANDO  123
B120T    MRAND  567
M234K   OMRAN  678
A123B   DOMRA  999

I use the custom sort function to take all the needed fields and sort them correctly.  Up until now,  the only sort I needed was do a sort by the last digit of field A, then
if (tmp == 0)
  Compare Field C's
then return the tmp variable
However, now a new requirement has been added and I am not sure how to implement it.  
I need to be able to (hopefully without redoing the way I currently sort) sort the same way, but if Field A is TOTALLY the same, then it should all be together but still sub sorted by Field C.  Otherwise, it need to be sorted the way it has been sorted, Field A last number first, then Field C.

well, I am using simplified data for you, but here is a scetch of the simplified pseudocode.

$FIELDA_1 = substr($a,1,5);
$FIELDA_2 = subsr($b,1,5);

$FIELDC_1 = substr($a,14,3);
$FIELDC_2 = substr($b,14,3);

#reverse FIELD A
$Reverse_1 = ($1) if $FIELDA_1 =~ /(\d+)/;
$Reverse_2 = ($1) if $FIELDA_2 =~ /(\d+)/;

$FirstCharacter_1 = substr ($Reverse_1, length($FIELDA_1) - 1);
$FirstCharacter_2 = substr ($Reverse_2, length($FIELDA_2) - 1);

$tmp = $FirstCharacter_1 <=> $FirstCharacter_2;

if ($tmp == 0){
if ($FIELDC_1 > $FIELDC_2)
   $tmp = 1;
elseif($FIELDC_1 < $FIELDC_2)
  $tmp = 1;
$tmp = 0;

return $tmp;

I would prefer to keep my code the same for the most part, but if you suggest using something like map i can do that but will have to give more detail since I am not familiar with it.
Who is Participating?
Is this also acceptable output?:
M23M4B  OMRAN     78
A13B    MRA         96
A13B    DOMRA     999
A123B  RANDO      123

I think you first need to break it up. First sort by last character of first field (resolve ties on first field)

turns into my above (from previous post) sample date into:
B120T       MRANA     67
B120T       MRAND     223
A123B       RANDO     123
A13B         DOMRA     999
A13B         MRA           96
M23M4B  OMRAN     78

then sort each sub group:

{B120T       MRANA     67,
B120T       MRAND     223}
{A123B       RANDO     123,
A13B         DOMRA     999,
A13B         MRA           96,
M23M4B  OMRAN     78}

What if you store the records in some sort of linked structure. The next ptr points to another record that has an identical field 1.

typedef struct ARecord
   char* field1;
   char* field2;
   int field3;
   struct ARecord* next;
} Record;
Record table[N];

table[0] = {A123B       RANDO     123   NULL}
table[1] = {A13B         DOMRA     999   NULL}
table[2] = {A13B         MRA           96   NULL}
table[3] = {M23M4B  OMRAN     78   NULL}
table[0] = {A123B       RANDO     123   NULL}
table[1] = {A13B         DOMRA     999   {A13B      MRA     96   NULL}}
table[2] = {M23M4B  OMRAN     78   NULL}

start and N and work backwards:
for (int i=N;i>1; i--)
  if table[i].field1 == table[i-1].field1
      table[i-1]->next = copy_entry(table[i]);
      delete_entry(table, i);

then to sort each subgroup:
for (0 to N)
   // sort each list so:
   // table[i].field3 < table[i].next->field3 ...
   sort table[i](from table[i].field3 to table[i].next->field3 ...)

// table[1] = {A13B         DOMRA     999   {A13B      MRA     96   NULL}}
// becomes:
// table[1] = {A13B      MRA     96   {A13B         DOMRA     999   NULL}}

for (0 to N)
   sort table[0 ..  N].field3

table[0] = {M23M4B  OMRAN     78   NULL}
table[1] = {A13B      MRA     96   {A13B         DOMRA     999   NULL}}
table[2] = {A123B       RANDO     123   NULL}

# a side note: following sed command gets the last char of the first field and paste's it onto
# the end of the line: my experiments on the command line didn't quite work
sed "s/^\w\+\(\w\)[ \t]\w\+[ \t]\w\+$/\1/" < DATAFILE | paste DATAFILE -
Nguyen Huu PhuocSenior ManagerCommented:
I think you should combine FieldA and FieldC into a string respectively.
When you sort by $temp you will get the correct order you want.
I hope I have aided you something.
Phuoc H. Nguyen
HonorGodSoftware EngineerCommented:
 sub byField {
    $a1 = substr( $a, 1, 3 );  # Field A (number) - record 1
    $c1 = substr( $a, 12 );    # Field C          - record 1
    $a2 = substr( $b, 1, 3 );  # Field A (number) - record 2
    $c2 = substr( $b, 12 );    # Field C          - record 1
    if ( $a1 == $a2 ) {        # Are A Fields numerically equal?
      $c1 <=> $c2;             #
    } else {                   #
      $a1 <=> $a2;             #
    }                          #

  $filename = "data.txt";

  open( DATA, "<$filename" ) or die "Unable to open $filename. $!";
  chomp( @data = <DATA> );
  close( DATA );
  print "----+----1----+----2\n";
  foreach $line ( @data ) {
    $line =~ s/  +/|/g;
    print "$line\n";
  print "\n\n----+----1----+----2\n";
  @info = sort byField @data;
  foreach $line ( @info ) {
     $line =~ s/\|/  /;
     $line =~ s/\|/ /;
     print "$line\n";
Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

Unix has an awesome sort utility!

The following C code runs the sort program, equivalent to the following from the command-line:
sort -k 1.5,1.5d -k 3,3n FILENAME > sorted

It sorts the file, FILENAME according to the 5th char af the first field (-k 1.5,1.5)  and resolves ties on the third field (-k 3,3n) then pipes the output to flat text file sorted. This might be a bit far fetched to use this method, you can also play around with the arguments, say if tmp==0 then change your args to do ... etc. Of course, it's probably a good idea to get the following code out of main and into a function.
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

main(int argc, char** argv)
char * args[] = {"sort", "-k", "1.5,1.5d", "-k", "3,3n", "FILENAME", NULL};
pid_t pid=0;
int* status=0;
int fd=0;

fd = open("sorted", O_WRONLY | O_TRUNC | O_CREAT, 0666); /* pipe to here */
if (!(pid = fork()))
        dup2(fd,1); /* change the childs stdout to the file - the pipe */
/*    printf("In the child\n"); */
        execv("/bin/sort", args); /* run the sort command with the above args */

waitpid(pid, &status, 0); /* wait for the sort to finish */

/* continue on with your program */

return 0;
feldmaniAuthor Commented:
poid99, this is ALMOST what I am looking for.  However the character is not always constant.  If you read what I say afterwards, I specify it has to be by the LAST character of Field A, NOT by the fifth character.  Basically I have to take the substr, get rid of the zero's then sort.
say the datafile =
A123B       RANDO     123
A13B         DOMRA     999
B120T       MRANA     67
B120T       MRAND     223
A13B         MRA           96
M23M4B  OMRAN     78

What should the output be?
feldmaniAuthor Commented:
B120T       MRANA     67
B120T       MRAND     223

A13B    MRA         96
A13B    DOMRA     999

A123B  RANDO      123

M23M4B  OMRAN     78

That is what the output should look like.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.