• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 668
  • Last Modified:

Awk assistance to grab information from a Text File

Hi everyone

I have a lot of horrible log files - and I'm trying to get the files to output valid information to a nice CSV format or something similar.  The system I'm on is a Solaris system (SunOS xxxxname 5.10 Generic_138888-08 sun4v sparc SUNW,SPARC-Enterprise-T5220) - so I have grep, awk and sed - but I don't have grep -A/-B.

There are a large number of log files - all of the files are listed as "xxx_{pid}.log" - so, for example, I have files "xxx_3074.log" and "xxx_6781.log"

Now, there are SOME of these files that have important information that I'm seeking.  First of all, the only files I'm interested in are those that have the following information in the first few lines (example follows):

3074/1 MAIN_THREAD                              Wed Apr  7 06:23:10.652804      ipcmisc.c299
        process 3074 <xxxnet_n> registered in entry 21

in effect, the key is "<xxxnet_n> registered" - no other files have this.  Here is an example of my grepping this :

grep '<xxxnet_n> registered' *
xxx_3006.log:   process 3006 <xxxnet_n> registered in entry 12
xxx_3013.log:   process 3013 <xxxnet_n> registered in entry 13
xxx_3014.log:   process 3014 <xxxnet_n> registered in entry 14
xxx_3015.log:   process 3015 <xxxnet_n> registered in entry 15
xxx_3017.log:   process 3017 <xxxnet_n> registered in entry 16
xxx_3054.log:   process 3054 <xxxnet_n> registered in entry 17
xxx_3057.log:   process 3057 <xxxnet_n> registered in entry 18
xxx_3058.log:   process 3058 <xxxnet_n> registered in entry 19
xxx_3074.log:   process 3074 <xxxnet_n> registered in entry 21

OK - now, in each of THESE files, there is important debugging information that looks like this.  
3074/1 MAIN_THREAD                              Wed Apr  7 14:16:34.893821      netsig.c171
        net process: adding process 6781 in Unregister list.

3074/1 MAIN_THREAD                              Wed Apr  7 14:16:34.894414      netsig.c353
        Kernel Process 6781 has died

3074/1 MAIN_THREAD                              Wed Apr  7 14:16:34.894989      netsig.c179
        net process: process 6781 set to Zombie

3074/1 MAIN_THREAD                              Wed Apr  7 14:16:34.895837      ipcmisc.c299
        State information for process 6781, User=_USERID, Role=*ALL, Environment=JETDV, Profile=NONE, Application=P400511, Client Machine=xxxpc, Version=SR0001, Thread ID=6, Thread Name=WRK:_USERID_004F1600_P400511.

3074/1 MAIN_THREAD                              Wed Apr  7 14:16:34.896964      ipcmisc.c299
        Call stack for process 6781, thread 6:

3074/1 MAIN_THREAD                              Wed Apr  7 14:16:34.897278      ipcmisc.c299
        API ipcSawZombieProcV1 : process 6781 set to Zombie in entry 48

Now - the first 4 characters (3074) indicate the pid of the process that has created the log file (in the above example, its "xxx_3074.log")

This first file can be very large - since this "listener" process logs all of the other processes that fail and go to "Zombie" state.  (Note that "Zombie" is not specifically the same reference as a true "Zombie" unix process - the PID 6781 doesn't exist at this point !).  

The xxx_6781.log file doesn't necessarily have any further information that we need (its unreliable).

So, this is what I would like to occur.  Using some sort of sed or awk script, I'd like to be able to go through all files in a directory (*) looking just for the xxxnet_n process logs, then going through each of these, to be able to extract the following information and pipe this to some file:

Date, Time,Failing PID, Listener PID, User, Role, Environment, Profile, Client Machine, Application, Version, Call Stack

SO, grabbing all the above information in the example, I'd like to extract the following :

Wed Apr  7,14:16:34.893821,6781,3074,_USERID,*ALL,JETDV,NONE,xxxpc,P400511,SR0001,libCSALES.so/UNKNOWN/F4211FSBeginDoc

Heres the rub.  I want to crontab this - running every 30 minutes or so, and I'd only want to extract NEW information to the file (so after doing this once, I'd like to take into consideration the date/time and only look for information that has been created in the past 30 minutes)

I'd be VERY grateful to anyone that can help me with this task !  Thankyou very much in advance....
  • 3
  • 3
1 Solution
The following might help.

Every time it runs, it stores the current time in a file in the current directory called "lastrun.time" - it uses this to work out the date/time (to the second) when it last ran.

Then, for each .log file in the current directory which contains the text "<xxxnet_n> registered", it looks through all log entries on or after the last run time.

When it comes across the two log lines with "State information for process" and "Call stack for process", it prints out the required information from those two lines.

It always prints out the CSV header, even if it doesn't find any lines.

To run this script, add the attached file to a script directory as a file called parse_nasty.sh, make it executable (chmod +x parse_nasty.sh) and in your contab entry have:

0,30 * * * * cd /directory/where/logs/are ; /path/to/script/parse_nasty.sh

It will then process all ".log" files in the /directory/where/logs/are directory.
: Parse nasty log files for entries since the last run time

# get last run time (seconds since the epoch) from file - default to the epoch
if [ -r lastrun.time ]
  lr=$(cat $lastrunfile)
  rm $lastrunfile
# save the current time as the new "last run time"
perl -e 'print time();' > $lastrunfile

# create a temporary Perl script which prints out the line
# number of the first line on or after the date in $lastrunfile
trap "rm -f $datescan" 0
cat > $datescan <<EOF
#!$(which perl) --
use Time::Local;
my \$stim=$lr;
my \$yr=$(date '+%Y')-1900;

cat >> $datescan <<\EOF
%mon2num = qw(jan 1  feb 2  mar 3  apr 4  may 5  jun 6
              jul 7  aug 8  sep 9  oct 10 nov 11 dec 12);
my $lcount=0;
while (<>)
        @inl = split;
        # Date format in file, starts with field 2: Wed Apr  7 06:23:10.652804
        if (($inl[0] =~ /^[0-9]*\/[0-9]*$/) && (timelocal(substr($inl[5],6,2),substr($inl[5],3,2),substr($inl[5],0,2),$inl[4],$mon2num{ lc $inl[3]} - 1,$yr) >= $stim)) {
                printf("%d\n", $lcount);
print 0;
chmod +x $datescan

echo "Date,Time,Failing PID,Listener PID,User,Role,Environment,Profile,Client Machine,Application,Version,Call Stack"
for ii in $(grep -l '<xxxnet_n> registered' *.log)
  # We have a log file with the right entries in it - search after
  # the "last run" time, and extract the required data
  lrun=$($datescan < $ii)
  if [ $lrun -gt 0 ]
    # File contains lines after the lastruntime - parse them with awk
    sed -n "$lrun,\$p" $ii | tr '=,' '  ' | awk '/^[0-9]*\/[0-9]*/{datetime=$3 " " $4 " " $5 "," $6; lpid=substr($1,1,4);}
/State information for process/{linept1=datetime "," $5 "," lpid "," $7 "," $9 "," $11 "," $13 "," $18 "," $15 "," $20}
/Call stack for process /{getline;print linept1 "," $0}'

exit 0

Open in new window

As *soon* as I pressed "Submit", I spotted a potential problem.  The script writes the CSV lines out to standard out, so with the above crontab entry, you would get a mail every half hour with the output lines in it - this *may* be what you want!

In case it isn't, and you actually want the data in a file, modify the script as follows:

Replace the currently empty line 2 with:
  outfile=parsed_nasty_$(date '+%Y%m%d_%H%M%S').csv

On lines 45 (the echo "Date,...) add the following to the end of the line:
   > $outfile
and on line 56 (the "Call stack" line of the awk script) add the following to the end of the line:
   >> $outfile

This will write a file with the name parsed_nasty_YYYYMMDD_HHMMSS.csv (where YYYY is the current 4-digit year, MM the month, DD the day, HHMMSS the hour/minute/second).
altquarkAuthor Commented:
hi simon.


I had to initially generate a lastrun.time file with "0000000000" to catch up the first lot of files - but it runs superbly !

I'm going to run this over the next day or so and keep this question open - just in case I get any issues - but I'll be awarding points to you by friday unless I get issues - I hope thats ok.

Thankyou very, very much.  I'll try and bonus points to you for this work too !

Cloud Class® Course: Microsoft Office 2010

This course will introduce you to the interfaces and features of Microsoft Office 2010 Word, Excel, PowerPoint, Outlook, and Access. You will learn about the features that are shared between all products in the Office suite, as well as the new features that are product specific.

altquarkAuthor Commented:
one comment, simon.

Is it possible to modify the script to ONLY create the output file if there are valid variables?  We're only going to get entries infrequently (we hope) ?  What would I need to modify to achieve this ?

I can think of a couple of ways - either we only write the header line if we find data to write, or we create the file and then delete it at the end if it only has one line in it (i.e. only the header line).

The first one is a bit more complex (you would need to write the "sed" output to a temporary file, then print the header line if the temporary file has data in it).  The second is easier to do - just add the following after line 58 (i.e. after the "done" but before the "exit 0" at the bottom of the script):

if [ $(cat $outfile | wc -l) -eq 1 ]
  rm $outfile
altquarkAuthor Commented:
Fantastic solution.  I am very impressed with how you managed to come up with an entire solution like that in just a couple of posts.  Thankyou very much.  I will be looking out for you again !
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Introducing Cloud Class® training courses

Tech changes fast. You can learn faster. That’s why we’re bringing professional training courses to Experts Exchange. With a subscription, you can access all the Cloud Class® courses to expand your education, prep for certifications, and get top-notch instructions.

  • 3
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now