Link to home
Start Free TrialLog in
Avatar of altquark
altquark

asked on

Gawk/Sed assistance for some text files

Hi everyone

I have a lot of horrible log files - and I'm trying to get the files to output valid information to a nice CSV format or something similar.  The system I'm on is a Solaris system (SunOS xxxxname 5.10 Generic_138888-08 sun4v sparc SUNW,SPARC-Enterprise-T5220) - so I have grep, awk and sed - but I don't have grep -A/-B.

There are a large number of log files - all of the files are listed as "xxx_{pid}.log" - so, for example, I have files "xxx_3074.log" and "xxx_6781.log"

Now, there are SOME of these files that have important information that I'm seeking.  First of all, the only files I'm interested in are those that have the following information in the first few lines (example follows):

3074/1 MAIN_THREAD                              Wed Apr  7 06:23:10.652804      ipcmisc.c299
        process 3074 <xxxnet_n> registered in entry 21

in effect, the key is "<xxxnet_n> registered" - no other files have this.  Here is an example of my grepping this :

grep '<xxxnet_n> registered' *
xxx_3006.log:   process 3006 <xxxnet_n> registered in entry 12
xxx_3013.log:   process 3013 <xxxnet_n> registered in entry 13
xxx_3014.log:   process 3014 <xxxnet_n> registered in entry 14
xxx_3015.log:   process 3015 <xxxnet_n> registered in entry 15
xxx_3017.log:   process 3017 <xxxnet_n> registered in entry 16
xxx_3054.log:   process 3054 <xxxnet_n> registered in entry 17
xxx_3057.log:   process 3057 <xxxnet_n> registered in entry 18
xxx_3058.log:   process 3058 <xxxnet_n> registered in entry 19
xxx_3074.log:   process 3074 <xxxnet_n> registered in entry 21

OK - now, in each of THESE files, there is important debugging information that looks like this.  
.
.
3074/1 MAIN_THREAD                              Wed Apr  7 14:16:34.893821      netsig.c171
        net process: adding process 6781 in Unregister list.

3074/1 MAIN_THREAD                              Wed Apr  7 14:16:34.894414      netsig.c353
        Kernel Process 6781 has died

3074/1 MAIN_THREAD                              Wed Apr  7 14:16:34.894989      netsig.c179
        net process: process 6781 set to Zombie

3074/1 MAIN_THREAD                              Wed Apr  7 14:16:34.895837      ipcmisc.c299
        State information for process 6781, User=_USERID, Role=*ALL, Environment=JETDV, Profile=NONE, Application=P400511, Client Machine=xxxpc, Version=SR0001, Thread ID=6, Thread Name=WRK:_USERID_004F1600_P400511.

3074/1 MAIN_THREAD                              Wed Apr  7 14:16:34.896964      ipcmisc.c299
        Call stack for process 6781, thread 6:
libCSALES.so/UNKNOWN/F4211FSBeginDoc

3074/1 MAIN_THREAD                              Wed Apr  7 14:16:34.897278      ipcmisc.c299
        API ipcSawZombieProcV1 : process 6781 set to Zombie in entry 48
.
.

Now - the first 4 characters (3074) indicate the pid of the process that has created the log file (in the above example, its "xxx_3074.log")

This first file can be very large - since this "listener" process logs all of the other processes that fail and go to "Zombie" state.  (Note that "Zombie" is not specifically the same reference as a true "Zombie" unix process - the PID 6781 doesn't exist at this point !).  

The xxx_6781.log file doesn't necessarily have any further information that we need (its unreliable).

So, this is what I would like to occur.  Using some sort of sed or awk script, I'd like to be able to go through all files in a directory (*) looking just for the xxxnet_n process logs, then going through each of these, to be able to extract the following information and pipe this to some file:

Date, Time,Failing PID, Listener PID, User, Role, Environment, Profile, Client Machine, Application, Version, Call Stack

SO, grabbing all the above information in the example, I'd like to extract the following :

Wed Apr  7,14:16:34.893821,6781,3074,_USERID,*ALL,JETDV,NONE,xxxpc,P400511,SR0001,libCSALES.so/UNKNOWN/F4211FSBeginDoc

Heres the rub.  I want to crontab this - running every 30 minutes or so, and I'd only want to extract NEW information to the file (so after doing this once, I'd like to take into consideration the date/time and only look for information that has been created in the past 30 minutes)

I'd be VERY grateful to anyone that can help me with this task !  Thankyou very much in advance....




Avatar of wietman
wietman
Flag of United States of America image

So I take it the files are appended.
So you will need to recheck each file for information that has been appended since the last time your script scanned?

If so, you will need to store some kind of LAST_SCAN marker.  This could be tricky in that suppose your job scans the last file 5 minutes after the first file is scanned.  I think you see the issue.  If this is not dealt with correctly you will have an opportunity to either miss data or duplicate it.
Also, it looks like you need information after the grep'ed line.
How do you know what the starting point and ending point is of each peice of data you want.

i.e. I find a starting point for the data I need using grep.  How do I know when to stop collecting the data from a particular file.
Avatar of altquark
altquark

ASKER

right - but there is a key feed in the output file based on Date/Time.

Before appending to each file, I guess it would be possible to "grep -c" the values it is about to append to see if they already exist in the output - otherwise to skip and go to the next one ?

However, this would be a secondary point.  I think the primary would be how to grab just the output at least for the first time (I could always just delete the output file before running this process each time initially !)
wietman - good question on info after the grep'ed line.  I would think that all the information would be contained within the starting parameter = 'in Unregister list' and ending parameter 'set to Zombie'

However, I believe that each failure is somewhat similar - so looking for 'State information for process' and the prior 3 paragraphs and the following 2 paragraphs might work also.
So it sounds like you need a grep -n option to get the line number of some data you want, then you need to pull that number (x-10, for example)  through (x+10).
This technique you would use to get "prior 3 paragraphs and the following 2 paragraphs".

To get a stop and start, you might use grep -n to get your start line number and another grep -n to get your stop number, then you might use ed to delete out the lines you don't want , using d1-x and dy-$., then cat or list the file to you output.  This is VERY klunky, but I could probably throw something together to do this.
I have to leave now, but I will be back online in 15 hours from now.  If you send me a single file and tell me specifically what data you want, as an attachment, I can try to put something together for you tomorrow morning when I get back in.
wouldn't awk or sed be better ?  Seems you're trying to use ed ?  I'd want this script run many times during the day on thousands of files.  There is a lot of junk in a lot of these files that I'm not interested in - but awk should be able to search for specific terms.

I'm uploading an example file as requested.
xxx-3074.log
I have a hunch awk may be the way to go.
Unfortunately I have only dabbled in awk and really don't have a good base in it.
I can get you something that works , and in the meantime,  I'll contact one of the moderators and see if we can get an awk expert to offer you an alternative.

BTW:  You don't have a C or gnu C compiler do you?  I could really make you something that would rock in C.
OK.  I think I found a way to do this with sed.  I'll need about 15-20 minutes.
try something like this in a shell script where $1 is a file name.

sed -n '
/Unregister list/,/Zombie in entry/ {
/^/p
}
' $1
I like this one a little better.
It delimits the start of each entry in the file with 'XXXXXX' but it could be anything you want.

sed -n '
/Unregister list/,/Zombie in entry/ {
s/\(.* Unregister list.*\)/ XXXXXX \1/
/^/p
}
' $1

I definitely want something in awk I think.  If anyone has any assistance with awk - I would be extremely thankful.  I'm not sure that sed is going to work for this requirement.
What requirement is not being addressed?
I would be glad to address anything that is not covered.
Each time I try these sed commands, by the way, I'm getting "sed: command garbled"

I tried the following :

sed -n '/Unregister/,/Zombie/{/^/p}' *.log

and it tells me its garbled :

sed: command garbled: /Unregister/,/Zombie/{/^/p}

Any idea ?
IS it that you only want to see this for each issue?

SO, grabbing all the above information in the example, I'd like to extract the following :

Wed Apr  7,14:16:34.893821,6781,3074,_USERID,*ALL,JETDV,NONE,xxxpc,P400511,SR0001,libCSALES.so/UNKNOWN/F4211FSBeginDoc

right.  By the way, I noticed I hadn't picked the "Shell scripting" zone.  Would you be annoyed if I closed this question, and reopened it, including the "shell scripting" zone ?
I have no problem with that.  You might get more input, but I think you can choose multiple zones.  You might be able to just add zones.
ASKER CERTIFIED SOLUTION
Avatar of wietman
wietman
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
again, I'm getting "command garbled" :

sed: command garbled: /Unregister list/,/Zombie in entry/ {/State information/p}

I don't think this is working at all well with sed.  I really would like to use awk if possible.
try specifying just one file at first and see if that works.  If there are a lot of files matching *.log, you could be running into a commandline buffer issue.

I did test this with the file you uploaded and it works for me, but that was on AIX.  I have access to a Sun box.  So I am migrating my testing to that for further debugging.

You are the requestor.  So I certainly respect your desire to have this done using whatever tool you wish.
This does work fine for me on Solaris as well with your sample file.

I think you are running into a cmd line buffer issue or some issue with cutting and pasting from this website into your Unix window.
Again, try specifying just one file at first and see if that works.
I'm closing this because the zone its assigned to is not ideal.  I've re-opened the request in Shell Scripting zone and specified Awk.  Thankyou wietman for your help.
This has been competely solved in another solution using awk located here :

https://www.experts-exchange.com/questions/25960064/Awk-assistance-to-grab-information-from-a-Text-File.html

Thankyou for all your help anyway.