Awk assistance to grab information from a Text File

Hi everyone

I have a lot of horrible log files - and I'm trying to get the files to output valid information to a nice CSV format or something similar.  The system I'm on is a Solaris system (SunOS xxxxname 5.10 Generic_138888-08 sun4v sparc SUNW,SPARC-Enterprise-T5220) - so I have grep, awk and sed - but I don't have grep -A/-B.

There are a large number of log files - all of the files are listed as "xxx_{pid}.log" - so, for example, I have files "xxx_3074.log" and "xxx_6781.log"

Now, there are SOME of these files that have important information that I'm seeking.  First of all, the only files I'm interested in are those that have the following information in the first few lines (example follows):

3074/1 MAIN_THREAD                              Wed Apr  7 06:23:10.652804      ipcmisc.c299
        process 3074 <xxxnet_n> registered in entry 21

in effect, the key is "<xxxnet_n> registered" - no other files have this.  Here is an example of my grepping this :

grep '<xxxnet_n> registered' *
xxx_3006.log:   process 3006 <xxxnet_n> registered in entry 12
xxx_3013.log:   process 3013 <xxxnet_n> registered in entry 13
xxx_3014.log:   process 3014 <xxxnet_n> registered in entry 14
xxx_3015.log:   process 3015 <xxxnet_n> registered in entry 15
xxx_3017.log:   process 3017 <xxxnet_n> registered in entry 16
xxx_3054.log:   process 3054 <xxxnet_n> registered in entry 17
xxx_3057.log:   process 3057 <xxxnet_n> registered in entry 18
xxx_3058.log:   process 3058 <xxxnet_n> registered in entry 19
xxx_3074.log:   process 3074 <xxxnet_n> registered in entry 21

OK - now, in each of THESE files, there is important debugging information that looks like this.  
3074/1 MAIN_THREAD                              Wed Apr  7 14:16:34.893821      netsig.c171
        net process: adding process 6781 in Unregister list.

3074/1 MAIN_THREAD                              Wed Apr  7 14:16:34.894414      netsig.c353
        Kernel Process 6781 has died

3074/1 MAIN_THREAD                              Wed Apr  7 14:16:34.894989      netsig.c179
        net process: process 6781 set to Zombie

3074/1 MAIN_THREAD                              Wed Apr  7 14:16:34.895837      ipcmisc.c299
        State information for process 6781, User=_USERID, Role=*ALL, Environment=JETDV, Profile=NONE, Application=P400511, Client Machine=xxxpc, Version=SR0001, Thread ID=6, Thread Name=WRK:_USERID_004F1600_P400511.

3074/1 MAIN_THREAD                              Wed Apr  7 14:16:34.896964      ipcmisc.c299
        Call stack for process 6781, thread 6:

3074/1 MAIN_THREAD                              Wed Apr  7 14:16:34.897278      ipcmisc.c299
        API ipcSawZombieProcV1 : process 6781 set to Zombie in entry 48

Now - the first 4 characters (3074) indicate the pid of the process that has created the log file (in the above example, its "xxx_3074.log")

This first file can be very large - since this "listener" process logs all of the other processes that fail and go to "Zombie" state.  (Note that "Zombie" is not specifically the same reference as a true "Zombie" unix process - the PID 6781 doesn't exist at this point !).  

The xxx_6781.log file doesn't necessarily have any further information that we need (its unreliable).

So, this is what I would like to occur.  Using some sort of sed or awk script, I'd like to be able to go through all files in a directory (*) looking just for the xxxnet_n process logs, then going through each of these, to be able to extract the following information and pipe this to some file:

Date, Time,Failing PID, Listener PID, User, Role, Environment, Profile, Client Machine, Application, Version, Call Stack

SO, grabbing all the above information in the example, I'd like to extract the following :

Wed Apr  7,14:16:34.893821,6781,3074,_USERID,*ALL,JETDV,NONE,xxxpc,P400511,SR0001,

Heres the rub.  I want to crontab this - running every 30 minutes or so, and I'd only want to extract NEW information to the file (so after doing this once, I'd like to take into consideration the date/time and only look for information that has been created in the past 30 minutes)

I'd be VERY grateful to anyone that can help me with this task !  Thankyou very much in advance....
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

The following might help.

Every time it runs, it stores the current time in a file in the current directory called "lastrun.time" - it uses this to work out the date/time (to the second) when it last ran.

Then, for each .log file in the current directory which contains the text "<xxxnet_n> registered", it looks through all log entries on or after the last run time.

When it comes across the two log lines with "State information for process" and "Call stack for process", it prints out the required information from those two lines.

It always prints out the CSV header, even if it doesn't find any lines.

To run this script, add the attached file to a script directory as a file called, make it executable (chmod +x and in your contab entry have:

0,30 * * * * cd /directory/where/logs/are ; /path/to/script/

It will then process all ".log" files in the /directory/where/logs/are directory.
: Parse nasty log files for entries since the last run time

# get last run time (seconds since the epoch) from file - default to the epoch
if [ -r lastrun.time ]
  lr=$(cat $lastrunfile)
  rm $lastrunfile
# save the current time as the new "last run time"
perl -e 'print time();' > $lastrunfile

# create a temporary Perl script which prints out the line
# number of the first line on or after the date in $lastrunfile
trap "rm -f $datescan" 0
cat > $datescan <<EOF
#!$(which perl) --
use Time::Local;
my \$stim=$lr;
my \$yr=$(date '+%Y')-1900;

cat >> $datescan <<\EOF
%mon2num = qw(jan 1  feb 2  mar 3  apr 4  may 5  jun 6
              jul 7  aug 8  sep 9  oct 10 nov 11 dec 12);
my $lcount=0;
while (<>)
        @inl = split;
        # Date format in file, starts with field 2: Wed Apr  7 06:23:10.652804
        if (($inl[0] =~ /^[0-9]*\/[0-9]*$/) && (timelocal(substr($inl[5],6,2),substr($inl[5],3,2),substr($inl[5],0,2),$inl[4],$mon2num{ lc $inl[3]} - 1,$yr) >= $stim)) {
                printf("%d\n", $lcount);
print 0;
chmod +x $datescan

echo "Date,Time,Failing PID,Listener PID,User,Role,Environment,Profile,Client Machine,Application,Version,Call Stack"
for ii in $(grep -l '<xxxnet_n> registered' *.log)
  # We have a log file with the right entries in it - search after
  # the "last run" time, and extract the required data
  lrun=$($datescan < $ii)
  if [ $lrun -gt 0 ]
    # File contains lines after the lastruntime - parse them with awk
    sed -n "$lrun,\$p" $ii | tr '=,' '  ' | awk '/^[0-9]*\/[0-9]*/{datetime=$3 " " $4 " " $5 "," $6; lpid=substr($1,1,4);}
/State information for process/{linept1=datetime "," $5 "," lpid "," $7 "," $9 "," $11 "," $13 "," $18 "," $15 "," $20}
/Call stack for process /{getline;print linept1 "," $0}'

exit 0

Open in new window


Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
As *soon* as I pressed "Submit", I spotted a potential problem.  The script writes the CSV lines out to standard out, so with the above crontab entry, you would get a mail every half hour with the output lines in it - this *may* be what you want!

In case it isn't, and you actually want the data in a file, modify the script as follows:

Replace the currently empty line 2 with:
  outfile=parsed_nasty_$(date '+%Y%m%d_%H%M%S').csv

On lines 45 (the echo "Date,...) add the following to the end of the line:
   > $outfile
and on line 56 (the "Call stack" line of the awk script) add the following to the end of the line:
   >> $outfile

This will write a file with the name parsed_nasty_YYYYMMDD_HHMMSS.csv (where YYYY is the current 4-digit year, MM the month, DD the day, HHMMSS the hour/minute/second).
altquarkAuthor Commented:
hi simon.


I had to initially generate a lastrun.time file with "0000000000" to catch up the first lot of files - but it runs superbly !

I'm going to run this over the next day or so and keep this question open - just in case I get any issues - but I'll be awarding points to you by friday unless I get issues - I hope thats ok.

Thankyou very, very much.  I'll try and bonus points to you for this work too !

CompTIA Cloud+

The CompTIA Cloud+ Basic training course will teach you about cloud concepts and models, data storage, networking, and network infrastructure.

altquarkAuthor Commented:
one comment, simon.

Is it possible to modify the script to ONLY create the output file if there are valid variables?  We're only going to get entries infrequently (we hope) ?  What would I need to modify to achieve this ?

I can think of a couple of ways - either we only write the header line if we find data to write, or we create the file and then delete it at the end if it only has one line in it (i.e. only the header line).

The first one is a bit more complex (you would need to write the "sed" output to a temporary file, then print the header line if the temporary file has data in it).  The second is easier to do - just add the following after line 58 (i.e. after the "done" but before the "exit 0" at the bottom of the script):

if [ $(cat $outfile | wc -l) -eq 1 ]
  rm $outfile
altquarkAuthor Commented:
Fantastic solution.  I am very impressed with how you managed to come up with an entire solution like that in just a couple of posts.  Thankyou very much.  I will be looking out for you again !
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Shell Scripting

From novice to tech pro — start learning today.