asked on

Gawk/Sed assistance for some text files

Hi everyone

I have a lot of horrible log files - and I'm trying to get the files to output valid information to a nice CSV format or something similar. The system I'm on is a Solaris system (SunOS xxxxname 5.10 Generic_138888-08 sun4v sparc SUNW,SPARC-Enterprise-T5220) - so I have grep, awk and sed - but I don't have grep -A/-B.

There are a large number of log files - all of the files are listed as "xxx_{pid}.log" - so, for example, I have files "xxx_3074.log" and "xxx_6781.log"

Now, there are SOME of these files that have important information that I'm seeking. First of all, the only files I'm interested in are those that have the following information in the first few lines (example follows):

3074/1 MAIN_THREAD Wed Apr 7 06:23:10.652804 ipcmisc.c299
process 3074 <xxxnet_n> registered in entry 21

in effect, the key is "<xxxnet_n> registered" - no other files have this. Here is an example of my grepping this :

grep '<xxxnet_n> registered' *
xxx_3006.log: process 3006 <xxxnet_n> registered in entry 12
xxx_3013.log: process 3013 <xxxnet_n> registered in entry 13
xxx_3014.log: process 3014 <xxxnet_n> registered in entry 14
xxx_3015.log: process 3015 <xxxnet_n> registered in entry 15
xxx_3017.log: process 3017 <xxxnet_n> registered in entry 16
xxx_3054.log: process 3054 <xxxnet_n> registered in entry 17
xxx_3057.log: process 3057 <xxxnet_n> registered in entry 18
xxx_3058.log: process 3058 <xxxnet_n> registered in entry 19
xxx_3074.log: process 3074 <xxxnet_n> registered in entry 21

OK - now, in each of THESE files, there is important debugging information that looks like this.
.
.
3074/1 MAIN_THREAD Wed Apr 7 14:16:34.893821 netsig.c171
net process: adding process 6781 in Unregister list.

3074/1 MAIN_THREAD Wed Apr 7 14:16:34.894414 netsig.c353
Kernel Process 6781 has died

3074/1 MAIN_THREAD Wed Apr 7 14:16:34.894989 netsig.c179
net process: process 6781 set to Zombie

3074/1 MAIN_THREAD Wed Apr 7 14:16:34.895837 ipcmisc.c299
State information for process 6781, User=_USERID, Role=*ALL, Environment=JETDV, Profile=NONE, Application=P400511, Client Machine=xxxpc, Version=SR0001, Thread ID=6, Thread Name=WRK:_USERID_004F1600_P400511.

3074/1 MAIN_THREAD Wed Apr 7 14:16:34.896964 ipcmisc.c299
Call stack for process 6781, thread 6:
libCSALES.so/UNKNOWN/F4211FSBeginDoc

3074/1 MAIN_THREAD Wed Apr 7 14:16:34.897278 ipcmisc.c299
API ipcSawZombieProcV1 : process 6781 set to Zombie in entry 48
.
.

Now - the first 4 characters (3074) indicate the pid of the process that has created the log file (in the above example, its "xxx_3074.log")

This first file can be very large - since this "listener" process logs all of the other processes that fail and go to "Zombie" state. (Note that "Zombie" is not specifically the same reference as a true "Zombie" unix process - the PID 6781 doesn't exist at this point !).

The xxx_6781.log file doesn't necessarily have any further information that we need (its unreliable).

So, this is what I would like to occur. Using some sort of sed or awk script, I'd like to be able to go through all files in a directory (*) looking just for the xxxnet_n process logs, then going through each of these, to be able to extract the following information and pipe this to some file:

Date, Time,Failing PID, Listener PID, User, Role, Environment, Profile, Client Machine, Application, Version, Call Stack

SO, grabbing all the above information in the example, I'd like to extract the following :

Wed Apr 7,14:16:34.893821,6781,3074,_USERID,*ALL,JETDV,NONE,xxxpc,P400511,SR0001,libCSALES.so/UNKNOWN/F4211FSBeginDoc

Heres the rub. I want to crontab this - running every 30 minutes or so, and I'd only want to extract NEW information to the file (so after doing this once, I'd like to take into consideration the date/time and only look for information that has been created in the past 30 minutes)

I'd be VERY grateful to anyone that can help me with this task ! Thankyou very much in advance....