unix sort -u -k PERL UNIX

Hi,
I have been using the unix sort -u command within a Perl script.
I have an input file that has the following format:


2010-08-13T01:04:30,npdisconnect,0477203120,0477203120,,MOBM,C4900


timestamp,command,number1,number2,4 chars or less, 4 chars, 5 chars.

What i am interested in is the timestamp, and number1.

Sometimes there are duplicate lines in terms of number1.


2010-08-13T11:30:54,npdisconnect,0496395646,0496395646,,BEMO,C4700
2010-08-13T11:36:01,npdisconnect,0496395646,0496395646,,BEMO,C4700

In this case, I would like to keep the latest timestamp only, so keep

2010-08-13T11:36:01,npdisconnect,0496395646,0496395646,,BEMO,C4700

I tried vaious versions of sort -u -k
For -k it has the concept of first field, second field but since I have
a comma between fields I am not sure if I can use -k

In any case is there a way to use sort -u -k to keep only the duplicate
line with number1 with the latest timestamp (or a Perl trick).

example of input file:

2010-08-13T01:04:30,npdisconnect,0477203120,0477203120,,MOBM,C4900
2010-08-13T11:30:54,npdisconnect,0496395646,0496395646,,BEMO,C4700
2010-08-13T11:36:01,npdisconnect,0496395646,0496395646,,BEMO,C4700


desired output file:

2010-08-13T01:04:30,npdisconnect,0477203120,0477203120,,MOBM,C4900
2010-08-13T11:36:01,npdisconnect,0496395646,0496395646,,BEMO,C4700

Thanks.
Johannne1Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

jeromeeCommented:
This should work for you:
perl -F/,/ -ane'print unless $s{$F[2]}; $s{$F[2]}++' /your/file/path

Good luck!
0
Johannne1Author Commented:
Hi Jeromee,
Hang on, I shall try this, and let you know in about 10-15 minutes.
0
Johannne1Author Commented:
Hi Jeromee,
Can you please try and explain, because is this a special way to compile the
perl if so then i can't use this, right now in my perl I have

system "/usr/bin/sort -u $outfile > $outfilesorted";
system "/usr/bin/cp " . $outfilesorted." ". $outputname;
system "/usr/bin/rm " . $outfilesorted;
system "/usr/bin/rm " . $outfile;


I can update the sort -u with some kind of sort -u -k
can you incorporate your solution which looks like a regular expression substitution in
the line sort -u above?
0
Python 3 Fundamentals

This course will teach participants about installing and configuring Python, syntax, importing, statements, types, strings, booleans, files, lists, tuples, comprehensions, functions, and classes.

wilcoxonCommented:
Jeromee's solution would be in place of your perl script and would give you the earliest time rather than the latest.

You can do what you want within a perl script like this.  Let me know if there are any problems (input not sorted by time, output must be sorted by time, etc).
#!/usr/local/bin/perl
use strict;
use warnings;

my $infile = shift; # pass input file name on command line
#my $infile = 'somefile'; # alternately hard-code it

# read the $infile and only keep the latest time row
# assumes input is in time order
open IN, $infile or die "could not open $infile: $!";
my %data = map { chomp; my @arr = split /,/; $arr[2] => $_ } <IN>;
close IN;

# output the file to STDOUT
# this will be unsorted but unique
print $data{$_}, "\n" foreach (keys %data);
# this would sort by number1 field
# print $data{$_}, "\n" foreach (sort keys %data);
# sorting by timestamp would be possible but *MUCH* more difficult

Open in new window

0
Johannne1Author Commented:
Hi Wilcoxon,
I will try this out. it will take me about 30 minutes.
Johanne
0
jeromeeCommented:
If you want the latest timestamp, this should work:
    perl -F/,/ -ane'$s{$F[2]}=$_; END{print sort values %s}' /your/file/path

0
jeromeeCommented:
if you want to replace this line in your script:
    system "/usr/bin/sort -u $outfile > $outfilesorted"
try this:
    system q(perl -F/,/ -ane'$s{$F[2]}=$_; END{print sort values %s}' ). "$outfile > $outfilesorted";


0
Johannne1Author Commented:
Hi Jeromee,
You are fast! I didn't see the above so i will try it. I was trying Wilcoxon's but I will try this out soon.
0
Johannne1Author Commented:
Hi Wilcoxon,

I got this to work. I am just taking a long time because I am trying to understand how the key
value pair works in your soution. First i have read about that Perl does not maintain the order
of elements in a hash. I look at your my$data  so you declare a map you take away the carriage
return and split the lines according to a comma. I think the $arr[2] is the number1.
I am not sure how foreach (keys %data) manages to output exactly what I want but it works.
I understand hashmaps and different maps data structures in java can you explain how
this foreach is working with the %data.
I added an output file and sometimes the 20 in the 2010 is chopped...not sure why. The output
is correct:


$ more perl_sort.pl
#!/usr/local/bin/perl
use strict;
use warnings;

my $inputFile;
my $infile = "inputXXX.txt";
my $outputFile;
my $outputname = "outputXXX.txt";


# read the $infile and only keep the latest time row when duplicate numbers occur
# assumes input is in time order

open IN, $infile or die "could not open $infile: $!";
my %data = map { chomp; my @arr = split /,/; $arr[2] => $_ } <IN>;
close IN;

open ($outputFile, ">$outputname") || die "Can not open output file";
# output the file to STDOUT
# this will be unsorted but unique
#print $data{$_}, "\n" foreach (keys %data);

foreach (keys %data) {
    print $data{$_}, "\n";
    print $outputFile "  $data{$_}, \n";
}

here is the the ouput ifle it has a comma in front of the 2010

$ more ouputXXX.txt
, 2010-08-13T14:32:14,npbroadcast,0470544430,0470544430,BEMO,MOBM,C4900
, 2010-08-13T11:42:14,npdisconnect,0494047810,0494047810,,BEMO,C4700

is there a way to get:

2010-08-13T14:32:14,npbroadcast,0470544430,0470544430,BEMO,MOBM,C4900
2010-08-13T11:42:14,npdisconnect,0494047810,0494047810,,BEMO,C4700

or is this complicated I can live with the space , if it is complicated or split them out in another script.







0
wilcoxonCommented:
Sure.  I'll go over what the important lines are doing...

my %data = map { chomp; my @arr = split /,/; $arr[2] => $_ } <IN>;

is a short way of doing:

# loop over each line in the input
while (<IN>) {
    # remove the newline
    chomp;
    # split the line on comma and assign each piece to @arr
    my @arr = split /,/;
    # assign the full line to the hash with key of number1 (overwriting previous data)
    $data{$arr[2]} => $_; # $arr[2] = number1
}

foreach (keys %data) is technically unordered but it will sometimes consistently give you the ordering you want.

You can remove the comma and space by changing:
print $outputFile " $data{$_}, \n";
to
# {} added around $outputFile to make it clearer that it is an output file/stream
print {$outputFile} $data, "\n";

A quick hash primer...  In perl, a hash is effectively a list of the form (key1, val1, key2, val2, ..., keyX, valX).  The function "keys" effectively returns the even-number items (and "values" returns the odd-number items).  This implementation is why "%hash = map { $key => $val } @list" works (map is technically a list/array function).  So, "foreach (keys %data)" will loop over the keys of the %data hash one at a time.  Writing this made me realize that it could have been written more succinctly as "print $_, "\n" foreach (values %data)" and achieved the same thing (though possibly in a different order).

Let me know if you have any more questions...
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Unix OS

From novice to tech pro — start learning today.