Link to home
Start Free TrialLog in
Avatar of joaotelles
joaotellesFlag for United States of America

asked on

Shell - syntax to sort lines

Hi,

I have the following situation that Im not able to build a command line in Shell for it... Hope its not too complicated to explain...

Using this command I get these lines from a file that is generated daily - from 7 days ago :

find /var/opt/smarttreee/dpa/log/event/output/TrafficErrorEvents* -type f -mtime -7 -print0 | xargs -0 grep MDMD3

For this I get lines like this as output:

/var/opt/smarttreee/dpa/log/event/output/TrafficErrorEvents20130917.0340:20130917054940#20130917054940#6388920287#MDMD3#mdm1a#1#6388920288#3#ConfigurationName=SP_ADD#ConfigurationVersion=2.2#DestinationAddress=59175959807#IMEI=35801604028935#MSISDN=59175959807#TerminalId=35801604028935#
/var/opt/smarttrust/dpa/log/event/output/TrafficErrorEvents20130917.0340:20130917054940#20130917054940#6388920282#MDMD3#mdm1a#1#6388920283#3#ConfigurationName=SP_ADD#ConfigurationVersion=2.2#DestinationAddress=59178927867#IMEI=35210177000649#MSISDN=59178927867#TerminalId=35210177000649#

====

Some of these lines have this particular part duplicated:

IMEI=35210177

Not necessarily this number.. for example I can have something like this:

 /var/opt/smarttreee/dpa/log/event/output/TrafficErrorEvents20130917.0340:20130917054936#20130917054936#6388925052#MDMD3#mdm2a#1#6388925053#3#ConfigurationName=SP_ADD#ConfigurationVersion=2.2#DestinationAddress=59178470041#IMEI=35828141810883#MSISDN=59178470041#TerminalId=35828141810883#
/var/opt/smarttreee/dpa/log/event/output/TrafficErrorEvents20130917.0340:20130917054940#20130917054940#6388920276#MDMD3#mdm1a#1#6388920277#3#ConfigurationName=SP_ADD#ConfigurationVersion=2.2#DestinationAddress=59178961090#IMEI=35828141490515#MSISDN=59178961090#TerminalId=35763676490515#

NOTE the IMEI=35828141 duplicated.

=====

So what I need to get is a the count of lines (wc -l)  that doesnt have the IMEI part mentioned above duplicated.

For example, on the four lines I posted above, I would get a count of 3 since two of then have the IMEI=XXXXXXXX duplicated.

Is this possible?

Tks,
Joao
Avatar of ozo
ozo
Flag of United States of America image

perl -le ' /#IMEI=(\w{8})/ && ++$c{$1} for </var/opt/smarttreee/dpa/log/event/output/TrafficErrorEvents*MDMD3*>;print scalar keys %c'
How did you get
/var/opt/smarttrust/dpa/log/event/output/TrafficErrorEvents20130917.0340:20130917054940#20130917054940#6388920282#MDMD3#mdm1a#1#6388920283#3#ConfigurationName=SP_ADD#ConfigurationVersion=2.2#DestinationAddress=59178927867#IMEI=35210177000649#MSISDN=59178927867#TerminalId=35210177000649#
as output from
find /var/opt/smarttreee/dpa/log/event/output/TrafficErrorEvents*
?
perl -le '-f && 7 > -M && /#IMEI=(\w{8})/ && ++$c{$1} for </var/opt/smarttreee/dpa/log/event/output/TrafficErrorEvents*MDMD3*>;print scalar keys %c'
Avatar of joaotelles

ASKER

Im sorry .. mixed up the outputs... the right one the the:

/var/opt/smarttreee/dpa/log/event/output/TrafficErrorEvents20130917.0340:20130917054940#20130917054940#6388920282#MDMD3#mdm1a#1#6388920283#3#ConfigurationName=SP_ADD#ConfigurationVersion=2.2#DestinationAddress=59178927867#IMEI=35210177000649#MSISDN=59178927867#TerminalId=35210177000649#
Sorry the newbie question but do I have to include anything to put this command in a script?

perl -le '-f && 7 > -M && /#IMEI=(\w{8})/ && ++$c{$1} for </var/opt/smarttreee/dpa/log/event/output/TrafficErrorEvents*MDMD3*>;print scalar keys %c'

Something like this?

#!/bin/perl

Or this enough

#!/bin/sh
You should be able to include whatever you had included when you put your
find /var/opt/smarttreee/dpa/log/event/output/TrafficErrorEvents* -type f -mtime -7 -print0 | xargs -0 grep MDMD3 | wc -l
command in a script
It didnt work..

> perl -le '-f && 7 > -M && /#IMEI=(\w{8})/ && ++$c{$1} for </var/opt/smarttreee/dpa/log/event/output/TrafficErrorEvents*MDMD3*>;print scalar keys %c'
0

> pwd
/var/opt/smarttreee/dpa/log/event/output

And it has lines with MDMD3

> > grep MDMD3 TrafficErrorEvents20130924.0695 | more
20130924084946#20130924084946#6408537165#MDMD3#mdm2a#1#6408537166#3#ConfigurationName=SP_ADD#ConfigurationVersion=2.2#DestinationAddress=59176159059#IMEI=38401303344524
#MSISDN=59176159059#TerminalId=38401303344524#
20130924084951#20130924084951#6408537183#MDMD3#mdm2a#1#6408537184#3#ConfigurationName=SP_ADD#ConfigurationVersion=2.2#DestinationAddress=59177254495#IMEI=35598682364489
#MSISDN=59177254495#TerminalId=35598682364489#

Not if this is a problem but I have more than one file per day... for example: - I need that the last 7 seven days to be analized.. (so this could mean more than 7 files)

TrafficErrorEvents20130923.0689
TrafficErrorEvents20130923.0690
TrafficErrorEvents20130923.0691
TrafficErrorEvents20130924.0692
TrafficErrorEvents20130924.0693
TrafficErrorEvents20130924.0694
TrafficErrorEvents20130924.0695

Tks,
Joao
Sorry, I thought
/var/opt/smarttreee/dpa/log/event/output/TrafficErrorEvents20130917.0340:20130917054940#20130917054940#6388920287#MDMD3#mdm1a#1#6388920288#3#ConfigurationName=SP_ADD#ConfigurationVersion=2.2#DestinationAddress=59175959807#IMEI=35801604028935#MSISDN=59175959807#TerminalId=35801604028935#
was the name of the file
If it is a line in the file, then the command should be
perl -lne '/MDMD3/ && /#IMEI=(\w{8})/ && ++$c{$1}; END{print scalar keys %c}' TrafficErrorEvents20130924.0695
Tks!

> perl -lne '/MDMD3/ && /#IMEI=(\w{8})/ && ++$c{$1}; END{print scalar keys %c}' TrafficErrorEvents20130924.0695
606

But this analizes only one file... I need something that would analize the last 7 days files

(note that I can have more than one file per day - as I highlighted on last post)

For example:

TrafficErrorEvents20130923.0689
TrafficErrorEvents20130923.0690
TrafficErrorEvents20130923.0691
TrafficErrorEvents20130924.0692
TrafficErrorEvents20130924.0693
TrafficErrorEvents20130924.0694
TrafficErrorEvents20130924.0695
perl -lne '/MDMD3/ && /#IMEI=(\w{8})/ && ++$c{$1}; END{print scalar keys %c}'  TrafficErrorEvents2013092[34]*
or
perl -lne 'BEGIN{@ARGV=grep-f && 7 > -M,<*> unless @ARGV}/MDMD3/ && /#IMEI=(\w{8})/ && ++$c{$1}; END{print scalar keys %c}'
Im getting different results for them...

> perl -lne 'BEGIN{@ARGV=grep-f && 7 > -M,<*> unless @ARGV}/MDMD3/ && /#IMEI=(\w{8})/ && ++$c{$1}; END{print scalar keys %c}'
31462

> perl -lne '/MDMD3/ && /#IMEI=(\w{8})/ && ++$c{$1}; END{print scalar keys %c}'  TrafficErrorEvents2013092[34]*
5673

Is there a way to check which files each command is looking into? Like pirnt the lines instead of the number of lines? - Just to check if the count is getting the right files..

The first one I have to be in the files directory right? (thats ok)

Tks,
Joao
perl -lne 'BEGIN{@ARGV=grep-f && 7 > -M,<*> unless @ARGV}/MDMD3/ && /#IMEI=(\w{8})/ && !$c{$1}++ && print'
Tks! I will test it.
Tks.. it is working perfectly!

I will use this one:
> perl -lne 'BEGIN{@ARGV=grep-f && 7 > -M,<*> unless @ARGV}/MDMD3/ && /#IMEI=(\w{8})/ && ++$c{$1}; END{print scalar keys %c}'

===

I have one last question: Is there a way for it not to take the files that has the timestamp of the current day?

For example, lets say I have files like this:

TrafficErrorEvents20130917.0689
.
.
TrafficErrorEvents20130921.0689
TrafficErrorEvents20130922.0690
TrafficErrorEvents20130923.0691
TrafficErrorEvents20130923.0692
TrafficErrorEvents20130924.0693
TrafficErrorEvents20130924.0694

Using your command Im reading the files from the day 09/24 to the 09/17  (considering today as 09/24)

So, Is there a way for it to read the files from 09/23 to 09/17 ? NOT reading the files from 09/24 ?

Tks,
Joao Telles
Does it have to be by the timestamp in file name, or can it be by the modification time like the 7 day cutoff?
Does it have to be the beginning of the current day, or can it be 24 hours ago?
Would you also want to modify the 7 day cutoff to be based on the name of the file rather than the modification time, and should 7 days mean something like 144 hours before the beginning of the current day (or 143 or 146 if a daylight saving switchover occurred) instead of 168 hours ago like it is now?
It has to be from the begginning of the current day until 7 days ago... so not 24hrs, because 24hrs might eliminate a file from yesterday and this cant happen.

If you do it by the hour, I would have to run it in a specific time of the day to get the 7 days as described above.. otherwise it would eliminate a file from yesterday...

So I think it has to be by the timestamp in the file.
By "timestamp" do you mean in the the name of the file, or the modification (or creation) time in the file status information?
ASKER CERTIFIED SOLUTION
Avatar of ozo
ozo
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Its fine like this... but Im getting the lines as output:

20130918000021#20130918000021#6394535421#MDMD3#mdm2a#1#6394535422#3#ConfigurationName=SP_ADD#ConfigurationVersion=2.2#DestinationAddress=59176807441#IMEI=35662200145207
#MSISDN=59176807441#TerminalId=35662200145207#

Can you make it to output the number of lines?

Tks,
Joao
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of skullnobrains
skullnobrains

sed -ne 's/.*MDMD.*IMEI=\([0-9]*\).*/\1/p' $(find /var/opt/smarttreee/dpa/log/event/output/TrafficErrorEvents* -type f -mtime -7) | uniq | wc -l

the above assumes duplicates follow one-another. if not you'd have to stick a sort before the uniq. awk would be more efficient.
Tks.