Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 340
  • Last Modified:

sorting a flat file in Unix

I asked this question before but did not get an answer I could use.
I have a flat file that looks somewhat like this

Field A    Field B   Field C
--------   --------   --------
A123B    RANDO  123
B120T    MRAND  567
M234K   OMRAN  678
A123B   DOMRA  999

I use the custom sort function to take all the needed fields and sort them correctly.  Up until now,  the only sort I needed was do a sort by the last digit of field A, then
if (tmp == 0)
  Compare Field C's
then return the tmp variable
However, now a new requirement has been added and I am not sure how to implement it.  
I need to be able to (hopefully without redoing the way I currently sort) sort the same way, but if Field A is TOTALLY the same, then it should all be together but still sub sorted by Field C.  Otherwise, it need to be sorted the way it has been sorted, Field A last number first, then Field C.

well, I am using simplified data for you, but here is a scetch of the simplified pseudocode.
custom_sort(){

$FIELDA_1 = substr($a,1,5);
$FIELDA_2 = subsr($b,1,5);

$FIELDC_1 = substr($a,14,3);
$FIELDC_2 = substr($b,14,3);

#reverse FIELD A
$Reverse_1 = ($1) if $FIELDA_1 =~ /(\d+)/;
$Reverse_2 = ($1) if $FIELDA_2 =~ /(\d+)/;

$FirstCharacter_1 = substr ($Reverse_1, length($FIELDA_1) - 1);
$FirstCharacter_2 = substr ($Reverse_2, length($FIELDA_2) - 1);

$tmp = $FirstCharacter_1 <=> $FirstCharacter_2;

if ($tmp == 0){
if ($FIELDC_1 > $FIELDC_2)
{
   $tmp = 1;
}
elseif($FIELDC_1 < $FIELDC_2)
{
  $tmp = 1;
}
else
$tmp = 0;

return $tmp;

I would prefer to keep my code the same for the most part, but if you suggest using something like map i can do that but will have to give more detail since I am not familiar with it.
0
feldmani
Asked:
feldmani
1 Solution
 
phuocnhCommented:
I think you should combine FieldA and FieldC into a string respectively.
$temp=$fielda+$fieldc;
When you sort by $temp you will get the correct order you want.
I hope I have aided you something.
Phuoc H. Nguyen
0
 
HonorGodCommented:
 sub byField {
    $a1 = substr( $a, 1, 3 );  # Field A (number) - record 1
    $c1 = substr( $a, 12 );    # Field C          - record 1
    $a2 = substr( $b, 1, 3 );  # Field A (number) - record 2
    $c2 = substr( $b, 12 );    # Field C          - record 1
    if ( $a1 == $a2 ) {        # Are A Fields numerically equal?
      $c1 <=> $c2;             #
    } else {                   #
      $a1 <=> $a2;             #
    }                          #
  }

  $filename = "data.txt";

  open( DATA, "<$filename" ) or die "Unable to open $filename. $!";
  chomp( @data = <DATA> );
  close( DATA );
  print "----+----1----+----2\n";
  foreach $line ( @data ) {
    $line =~ s/  +/|/g;
    print "$line\n";
  }
  print "\n\n----+----1----+----2\n";
  @info = sort byField @data;
  foreach $line ( @info ) {
     $line =~ s/\|/  /;
     $line =~ s/\|/ /;
     print "$line\n";
  }
0
 
poid99Commented:
Unix has an awesome sort utility!

The following C code runs the sort program, equivalent to the following from the command-line:
sort -k 1.5,1.5d -k 3,3n FILENAME > sorted

It sorts the file, FILENAME according to the 5th char af the first field (-k 1.5,1.5)  and resolves ties on the third field (-k 3,3n) then pipes the output to flat text file sorted. This might be a bit far fetched to use this method, you can also play around with the arguments, say if tmp==0 then change your args to do ... etc. Of course, it's probably a good idea to get the following code out of main and into a function.
----------------------------------------------------
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

int
main(int argc, char** argv)
{
char * args[] = {"sort", "-k", "1.5,1.5d", "-k", "3,3n", "FILENAME", NULL};
pid_t pid=0;
int* status=0;
int fd=0;

fd = open("sorted", O_WRONLY | O_TRUNC | O_CREAT, 0666); /* pipe to here */
if (!(pid = fork()))
{
        dup2(fd,1); /* change the childs stdout to the file - the pipe */
/*    printf("In the child\n"); */
        execv("/bin/sort", args); /* run the sort command with the above args */
}

waitpid(pid, &status, 0); /* wait for the sort to finish */
close(pid);

/* continue on with your program */

return 0;
}
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
feldmaniAuthor Commented:
poid99, this is ALMOST what I am looking for.  However the character is not always constant.  If you read what I say afterwards, I specify it has to be by the LAST character of Field A, NOT by the fifth character.  Basically I have to take the substr, get rid of the zero's then sort.
0
 
poid99Commented:
say the datafile =
A123B       RANDO     123
A13B         DOMRA     999
B120T       MRANA     67
B120T       MRAND     223
A13B         MRA           96
M23M4B  OMRAN     78
//-EOF-//

What should the output be?
0
 
feldmaniAuthor Commented:
B120T       MRANA     67
B120T       MRAND     223

A13B    MRA         96
A13B    DOMRA     999

A123B  RANDO      123

M23M4B  OMRAN     78

That is what the output should look like.
0
 
poid99Commented:
Is this also acceptable output?:
M23M4B  OMRAN     78
A13B    MRA         96
A13B    DOMRA     999
A123B  RANDO      123

------------
I think you first need to break it up. First sort by last character of first field (resolve ties on first field)

turns into my above (from previous post) sample date into:
B120T       MRANA     67
B120T       MRAND     223
A123B       RANDO     123
A13B         DOMRA     999
A13B         MRA           96
M23M4B  OMRAN     78

then sort each sub group:

{B120T       MRANA     67,
B120T       MRAND     223}
&
{A123B       RANDO     123,
A13B         DOMRA     999,
A13B         MRA           96,
M23M4B  OMRAN     78}

What if you store the records in some sort of linked structure. The next ptr points to another record that has an identical field 1.

typedef struct ARecord
{
   char* field1;
   char* field2;
   int field3;
   struct ARecord* next;
} Record;
Record table[N];

table[0] = {A123B       RANDO     123   NULL}
table[1] = {A13B         DOMRA     999   NULL}
table[2] = {A13B         MRA           96   NULL}
table[3] = {M23M4B  OMRAN     78   NULL}
becomes:
table[0] = {A123B       RANDO     123   NULL}
table[1] = {A13B         DOMRA     999   {A13B      MRA     96   NULL}}
table[2] = {M23M4B  OMRAN     78   NULL}

start and N and work backwards:
for (int i=N;i>1; i--)
{
  if table[i].field1 == table[i-1].field1
  {
      table[i-1]->next = copy_entry(table[i]);
      delete_entry(table, i);
      N--;
  }
}

then to sort each subgroup:
for (0 to N)
do
   // sort each list so:
   // table[i].field3 < table[i].next->field3 ...
   sort table[i](from table[i].field3 to table[i].next->field3 ...)
done

// table[1] = {A13B         DOMRA     999   {A13B      MRA     96   NULL}}
// becomes:
// table[1] = {A13B      MRA     96   {A13B         DOMRA     999   NULL}}

for (0 to N)
do
   sort table[0 ..  N].field3
done

table[0] = {M23M4B  OMRAN     78   NULL}
table[1] = {A13B      MRA     96   {A13B         DOMRA     999   NULL}}
table[2] = {A123B       RANDO     123   NULL}

# a side note: following sed command gets the last char of the first field and paste's it onto
# the end of the line: my experiments on the command line didn't quite work
sed "s/^\w\+\(\w\)[ \t]\w\+[ \t]\w\+$/\1/" < DATAFILE | paste DATAFILE -
0

Featured Post

Get expert help—faster!

Need expert help—fast? Use the Help Bell for personalized assistance getting answers to your important questions.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now