Solved

A script to clear log file

Posted on 2006-11-07
26
743 Views
Last Modified: 2013-12-27
Let's say that I have a Java program running, with its standard output directed to a log file:

java MyJavaProgram > stdout.log

The log file, stdout.log, can grow quite big, so I need a script to periodically archive it into a zip file, and then clear the log:

eval `date "+day=%d; month=%m; year=%Y"`
typeset -Z2 day month
typeset -Z4 year
CURRENT_DATE="${year}${month}${day}"
zip stdout.${CURRENT_DATE}.zip stdout.log
echo "Previous contents archived on `date`" > stdout.log

The 2nd last line of the script zips up the file, and the last line clears the log by writing some message into it.

Is this a good way to do so? When I'm writing the message into the log file in the last line, how can I be sure that the Java program is also not writing at the same time? When I've zipped up the file after the 2nd last line, how can I be sure that the log file is not being updated by the Java program before the last line of the script is executed?

Any better suggestion is welcome. Thanks.
0
Comment
Question by:integrosys
  • 10
  • 9
  • 4
  • +1
26 Comments
 
LVL 3

Accepted Solution

by:
jhartzen earned 50 total points
ID: 17895950
As a general rule, Many Unix and Linux variants have this possible problem which you are describing - (it is sometimes called a race-condition).  In you case, I would say it is worse because the zip could take quite a while to complete.  I would modify the script slightly like follow:

...
Do stuff to make new file name
...
cp stdout.log stdout.{CURRENT_DATE}.log
echo "Previous contents archived at `date`" > stdout.log
gzip stdout.{CURRENT_DATE}.log

You will notice that I copy the log file to a new log file with a date-stamp in its name.  This is so that when someone unzips the file, it will not overwrite any existing log files.

Also the copy operation will be quicker than the zip operation, so the time during which you could possibly lose messages is shorter, but in theory it still exists.  The trade-off here is that the copy will use more disk space, at least temporarily.

Also gzip will generally get better compression (Available with Solaris 9 and later by default, it is available from sunfreeware.com for earlier versions of Solaris)

My advice: Don't worry about losing messages, unless the copy takes *very* long.  In practice, it will take just a few seconds.

Hope this helps!
  _J
0
 

Author Comment

by:integrosys
ID: 17895995
Thanks. What about the race condition when I am doing this:

echo "Previous contents archived at `date`" > stdout.log

The Java program could be writing something to the file while at the same time I am writing my message to the file as well.
0
 
LVL 3

Expert Comment

by:jhartzen
ID: 17896079
The same condition is true: You can possibly lose data, but the time is short, so the likeliness is small.  You need to evaluate the risk vs the effort to develop some better solution.  Generally we say the risk is "acceptable".

There are several possible options.

1. Re-write the Java program to perform log management internally.
2. Write a log capture program, which captures the log on stdin, and writes it to log files which it manages cleanly.
3. Pause the java program (pkill -STOP MyJavaProgram), perform log management, resume program (pkill -CONT MyJavaProgram).  Obviously this has got the disadvantage that MyJavaProgram will hang while you switch the log file.
0
 
LVL 48

Expert Comment

by:Tintin
ID: 17896136
Why are you doing

eval `date "+day=%d; month=%m; year=%Y"`
typeset -Z2 day month
typeset -Z4 year
CURRENT_DATE="${year}${month}${day}"

instead of just

CURRENT_DATE=`date +%Y%m%d`

0
 

Author Comment

by:integrosys
ID: 17896316
Actually it is not just a normal Java program. It is the WebLogic managed server instance. When we run it, all its own debug statements goes to the screen. We redirect it to a file instead. For our web application that runs in the server, we do have proper log management. The file for the WebLogic debug statements will just keep growing, so we need some log management. The content of this file is not as important as the log files for our application.

>> CURRENT_DATE=`date +%Y%m%d`

My code was copied from somewhere. I think it's a solution from someone in EE, but I can't remember now. Anyway, thanks for pointing out!
0
 
LVL 10

Assisted Solution

by:Nukfror
Nukfror earned 50 total points
ID: 17897692
Ummmmm, why all the extra work that's in this:

...
Do stuff to make new file name
...
cp stdout.log stdout.{CURRENT_DATE}.log
echo "Previous contents archived at `date`" > stdout.log
gzip stdout.{CURRENT_DATE}.log

If you "do stuff to make new file name", why to do the copy ?  You've got a new file name, right, so the application is writing to a new log file so why "copy" the old one before you compress it ?  Course how you can change the name of a STDOUT redirected filename without stopping the application is magic to me ... call me crazy ....

A method that works wells with syslog is:

mv stdout.log stdout.{CURRENT_DATE}.log
<signal the application to start a new log file>
[compress|gzip|bzip2|whatever] stdout.{CURRENT_DATE}.log

Unfortunately the method above doesn't immediately work well with STDOUT based log file as you can't force a running application to easily close it STDOUT file descriptor and then reopen it with a log file name that's 100% out of its control.  You have to write the application such that it opens STDOUT (or whatever) as a physical file, you create a signal handler to watches for say SIGUSR1, when it sees that signal it closes the existing STDOUT file descriptor and open it again to the same file name.  With STDOUT redirected log files, you almost can't force a new name cuz the application writing the output has no control over the filename you the administrator told the shell to redirect the output into.  You can control the log filename with STDOUT redirected log files if you stop the application, roll the log, and then restart the application.

Assuming you can't stop the application, you have to hope that the following works all the time everytime:

<start application 10 hours ago or whatever in some other script>
....
<in my log rotation script>
cp stdout.log stdout.{CURRENT_DATE}.log
cp /dev/null stdout.log
echo "Previous contents archived at `date`" > stdout.log
gzip stdout.{CURRENT_DATE}.log

The problem is STDOUT offset pointer.  If that doesn't reset to zero when you do the "cp /dev/null ....", the original contents of the log file will magically reappear.
0
 

Author Comment

by:integrosys
ID: 17904177
cp /dev/null stdout.log
echo "Previous contents archived at `date`" > stdout.log

Does the extra "cp /dev/null stdout.log" make any difference?
0
 
LVL 3

Expert Comment

by:jhartzen
ID: 17904201
No.
0
 
LVL 3

Expert Comment

by:jhartzen
ID: 17904206
Or rather: Yes:  It is extra wasted work.
0
 
LVL 48

Expert Comment

by:Tintin
ID: 17904344
To clear a file, the quickest way is

>file

or

mv file /dev/null

0
 
LVL 10

Expert Comment

by:Nukfror
ID: 17906877
mv file /dev/null ??!!??  NOT !!!!

If run as a non-root user, you'll get a permission error.  If you run that as root, the stability of you system will be at risk - DON'T DO THAT - use ">file" or "cp /dev/null file".

Yep, ">file" or "cp /dev/null file" are the quickest ways - but they don't always work how you expect depending on how the log file is being written to.  See my previous post.
0
 
LVL 48

Expert Comment

by:Tintin
ID: 17909131
Sorry, my brain was on holiday.

I meant

cp /dev/null file

0
 
LVL 3

Expert Comment

by:jhartzen
ID: 17909873
The point is if you are about to write to a file, ie using
echo "some text" > file
this will trunkate the file and write the text in one command.  Doing extra work to first trunkate the file is wasted, irrespective of how efficient it is.  Worse, you have just added some extra event whereby messages to the log from the Java-program could be overwritten, eg between the truncate and the echo command is executed.

The only real solution here is to write a program which will manage the log files.  A script could do it, eg:
#!/bin/bash
# This script is a "stdin_log_handler"
# Courtesy J Hartzenberg
# Usage: Pipe logging output into this file.  Specify printf mask for log name on command-line, containing
#     five times the "%s" substitutor, for YEAR, MONTH, DAY, HOUR and MINUTE, eg
#
#   mYJavaProg | stdin_log_handler /var/log/myJavaProg.%s-%s-%s-%s:%s.log
#
# Disclaimer A: This script is in serious need of error condition handling
# Disclaimer B: This script was tested on linux (hence using bash, It should with with ksh too)

# Define max file size (Nr of 512-byte disk blocks) before switching to new file.  With slightly more advanced scripting, you can check the KB size, but this is neat.
MAXBLOCKS=50
TEMPLINK=/tmp/current_log
NewLog() {
      [ -f $TEMPLINK ] && rm -f $TEMPLINK
      LOGFILE=$(printf "$1\n" $( date "+%Y %m %d %H %M" )|head -1 )
      date "+New log file started at %C" > "$LOGFILE" && ln -s "$LOGFILE" $TEMPLINK || exit
}

NewLog "$1"

while read LINE
do
   echo "$LINE" >> $TEMPLINK
   SIZE=$( ls -sL $TEMPLINK | awk '{print $1}')
   [ $SIZE -gt $MAXBLOCKS ] && NewLog "$1"
done
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 
LVL 3

Expert Comment

by:jhartzen
ID: 17909980
I forgot about the compression of the previous log file.  This version is ammended to compress existing log files.

Also a handler is needed to allow multiple instances, eg for more than one program on the system.  Currently the softlink /tmp/current_log is the limitation, thes could just become /tmp/current_log.$$

Also one day when I have time I think I will investigate the posibility of using file handle re-assignment to simulate file opening / closing on the log file, and then the "echo $LINE" while be replaced by the command cat on its own, and the read LINE will be replaced by the command "true".  

#!/bin/sh

# This script is a "stdin_log_handler"
# Courtesy J Hartzenberg
# Usage: Pipe logging output into this file.  Specify printf mask for log name on command-line, containing five times %s, eg
#
#   mYJavaProg | stdin_log_handler /var/log/myJavaProg.%s-%s-%s-%s-%s.log
#
# Disclaimer A: This script is in serious need of error condition handling
# Disclaimer B: This script was tested on linux (hence using bash, It should with with ksh too)

# Define max file size (Nr of 512-byte disk blocks) before switching to new file.  With slightly more advanced scripting, you can check the KB size, but this is neat.
MAXBLOCKS=50
TEMPLINK=/tmp/current_log
NewLog() {

      if [ -f $TEMPLINK ]
      then
         OLDLOG=$(ls -l $TEMPLINK | sed 's/^.*-> //')
         gzip $OLDLOG && rm $TEMPLINK || exit 1
      fi
      LOGFILE=$(printf "$1\n" $( date "+%H %M %S" )|head -1 )
      date "+New log file started at %C" > "$LOGFILE" && ln -s "$LOGFILE" $TEMPLINK || exit 1
}

NewLog "$1"

while read LINE
do
   echo "$LINE" >> $TEMPLINK
   SIZE=$( ls -sL $TEMPLINK | awk '{print $1}')
   [ $SIZE -gt $MAXBLOCKS ] && NewLog "$1"
done
0
 
LVL 3

Expert Comment

by:jhartzen
ID: 17910020
Errata: My apologies, in my testing I used only three parameters, HOUR, MINUTE, SECOND, and I copy-pasted that test script into this forum.  The line assigning the new LOGFILE name should read.

LOGFILE=$(printf "$1\n" $( date "+%Y %m %d %H %M" )|head -1 )

and the "%C" in the second date line should be a lower-case "%c"

It is way past my bedtime.
0
 
LVL 10

Expert Comment

by:Nukfror
ID: 17911789
jhartzen, this is pretty sweet !  Nice :)
0
 
LVL 10

Expert Comment

by:Nukfror
ID: 17911903
You just reminded me that I wrote a Perl program a long time ago for collecting Weblogic SNMP statistics to be used with Orca for graphing.  It "rolled" the data files at a specific time interval just like Orca did ... now you've got me planning to update it to also be able to roll the file based on the file size.

Also, not thumping your suggestion cuz it is a good one, your script wil have to fork/exec ls and awk for *every* line of text - not the mention all the associated system calls each of these binaries will produce.  So depending on how heavy the log traffic is the task of simply determining when to "roll" the file could become computationally VERY expensive.  You could get rid of the nawk fork/exec by using changing to script to ksh or bash and using the built in string handling capability to capture the output from "ls -sL" - which leaves just the ls command.

The nice thing about Perl is it has a stat call which in the end produces a system call in Solaris to collect the file metadata like file size.

Now you've really got me thinking.
0
 
LVL 3

Expert Comment

by:jhartzen
ID: 17912549
Hi Nukfor,
Thank you for the compliment.  Last night after I wrote this script I was awake for a long time thinking of ways to improve it, exactly because of the inefficiency that you describe above!

One way would be to check the log file size only every 300 lines (or something like that).

Another would be to remove the check and add the following:

trap NewLog HUP

Then set up something in cron to check the file size every 10 minutes and if neccesary, send a HUP to the script.

The third method:  I have seen something in the past where there is something like a shell built-in which evaluates the size of a file.

For the second solution one can further improve the NewLog function with a mutex like this:
LOCKDIR=/var/locks/log_handler.$$
NewLog() {
while [ ! mkdir $LOCKDIR ]; do sleep 1; done
... do log switching here...
... Put gzip into background
rmdir $LOCKDIR
}

And, in all cases I would use at least the below to allow multiple simultaneous instances. TEMPLINK=/tmp/log_handler.current.$$

A perl script would be much more efficient, but I have still not gotten arround to learning perl yet!
0
 
LVL 10

Expert Comment

by:Nukfror
ID: 17914390
Actually I was continuing to think about this last night myself.  For something like this Perl work work just fine if not a little heavy on memory.  KSH and bash could do this without any fork/exec traffic at all if you use the features of each shell.

Both ksh and bash can easily determine the size of a string with ${#VARIABLE_NAME}.  So you can watch the size of the file by reading in and keeping track of the size of the file like this (I haven't tested this):

#!/bin/ksh
## Gotta use ksh or bash for this
# Define max file size (in bytes) before switching to new file.
MAXBYTES=1000000
TEMPLINK=/tmp/current_log

NewLog() {
      [ -f $TEMPLINK ] && rm -f $TEMPLINK
      LOGFILE=$(printf "$1\n" $( date "+%Y %m %d %H %M" )|head -1 )
      date "+New log file started at %C" > "$LOGFILE" && ln -s "$LOGFILE" $TEMPLINK || exit
      LOGSIZE=0
      LOGLINES=0
}

while getopts "l:" OPT; do
        case $OPT in
                l  ) MAXLINES=$OPTARG;;
                \? ) print 'option usage error'
                     return 1
        esac
done
shift $(($OPTIND - 1))

NewLog "$1"

while read LINE
do
   echo "$LINE" >> $TEMPLINK
   # LOGSIZE - always add 1 to total for each line to account for EOL character.  This obviously only applies to
   # UNIX EOL environments.  Windows would be +2.
   LOGSIZE = $(( ${LOGSIZE} + ${#LINE} + 1 ))
   LOGLINES = $(( ${LOGLINES} + 1 ))
   if [ $MAXLINES ] {
      [ $LOGLINES -gt $MAXLINES ] && NewLog "$1"
   } else {
      [ $LOGSIZE -gt $MAXBYTES ] && NewLog "$1"
   }
done
0
 
LVL 10

Expert Comment

by:Nukfror
ID: 17914392
And why I'm putting this much effort into a question worth 50 points I'll never know ...
0
 
LVL 10

Expert Comment

by:Nukfror
ID: 17914419
Forgot to add a switch for changes MAXBYTES:

#!/bin/ksh
## Gotta use ksh or bash for this
# Define max file size (in bytes) before switching to new file.
MAXBYTES=1000000
TEMPLINK=/tmp/current_log

NewLog() {
      [ -f $TEMPLINK ] && rm -f $TEMPLINK
      LOGFILE=$(printf "$1\n" $( date "+%Y %m %d %H %M" )|head -1 )
      date "+New log file started at %C" > "$LOGFILE" && ln -s "$LOGFILE" $TEMPLINK || exit
      LOGSIZE=0
      LOGLINES=0
}

while getopts "l:" OPT; do
        case $OPT in
                b ) MAXBYTES=$OPTARG;;
                l  ) MAXLINES=$OPTARG;;
                \? ) print 'option usage error'
                     return 1
        esac
done
shift $(($OPTIND - 1))

NewLog "$1"

while read LINE
do
   echo "$LINE" >> $TEMPLINK
   # LOGSIZE - always add 1 to total for each line to account for EOL character.  This obviously only applies to
   # UNIX EOL environments.  Windows would be +2.
   LOGSIZE = $(( ${LOGSIZE} + ${#LINE} + 1 ))
   LOGLINES = $(( ${LOGLINES} + 1 ))
   if [ $MAXLINES ] {
      [ $LOGLINES -gt $MAXLINES ] && NewLog "$1"
   } else {
      [ $LOGSIZE -gt $MAXBYTES ] && NewLog "$1"
   }
done
0
 
LVL 10

Expert Comment

by:Nukfror
ID: 17914539
damn missed something else:

#!/bin/ksh
## Gotta use ksh or bash for this
# Define max file size (in bytes) before switching to new file.
MAXBYTES=1000000
TEMPLINK=/tmp/current_log

NewLog() {
      [ -f $TEMPLINK ] && rm -f $TEMPLINK
      LOGFILE=$(printf "$1\n" $( date "+%Y %m %d %H %M" )|head -1 )
      date "+New log file started at %C" > "$LOGFILE" && ln -s "$LOGFILE" $TEMPLINK || exit
      LOGSIZE=0
      LOGLINES=0
}

while getopts "bl:" OPT; do
        case $OPT in
                b ) MAXBYTES=$OPTARG;;
                l  ) MAXLINES=$OPTARG;;
                \? ) print 'option usage error'
                     return 1
        esac
done
shift $(($OPTIND - 1))

NewLog "$1"

while read LINE
do
   echo "$LINE" >> $TEMPLINK
   # LOGSIZE - always add 1 to total for each line to account for EOL character.  This obviously only applies to
   # UNIX EOL environments.  Windows would be +2.
   LOGSIZE = $(( ${LOGSIZE} + ${#LINE} + 1 ))
   LOGLINES = $(( ${LOGLINES} + 1 ))
   if [ $MAXLINES ] {
      [ $LOGLINES -gt $MAXLINES ] && NewLog "$1"
   } else {
      [ $LOGSIZE -gt $MAXBYTES ] && NewLog "$1"
   }
done
0
 
LVL 3

Expert Comment

by:jhartzen
ID: 17920197
He he he.  This 50-point question could easily turn into something quite advanced.

I don't like my original printf/date contrsruct any more - I think it is unwieldy.  I'd like to change the function definition as follow (based on your last version)

#!/bin/ksh
## Gotta use ksh or bash for this
# Define max file size (in bytes) before switching to new file.
MAXBYTES=1000000
TEMPLINK=/tmp/current_log.$$

LOCKDIR=/var/locks/log_handler.$$
NewLog() {
      # Remember the real filename of the old (current) log file
      OLDLOG=$(ls -l $TEMPLINK | sed 's/^.*-> //')

      # Prevent multiple instances from entering this part of the script
      while [ ! mkdir $LOCKDIR ] ; do sleep 1; done
      [ -f $TEMPLINK ] && rm -f $TEMPLINK
      LOGFILE=$(date "+$1")
      date "+New log file started at %c" > "$LOGFILE" && ln -s "$LOGFILE" $TEMPLINK || exit
      rmdir $LOCKDIR

      # Reset counters
      LOGSIZE=0
      LOGLINES=0
      # Compress the old LOG file in the background
      gzip $OLDLOG &
}

MANUAL_LOGROTATING=0
while getopts "bl:" OPT; do
        case $OPT in
                b ) MAXBYTES=$OPTARG;;
                l  ) MAXLINES=$OPTARG;;
                m ) MANUAL_LOGROTATING=1
                \? ) print "$0 usage error
Usage:
$0 [ -b | -l | -m ] Logfilemask

Logfilemask is a string containing %H %M %S %Y %m %d etc.  This string will
be passed to "date" to build up a file name.
Note:
  $0 reads log lines from standard output, and writes the lines to a new file.  The
  file name is generated based on the mask.  See man date(1) for more info. On
  log switch, the previous log file will be compressed using gzip."
                     return 1
        esac
done
shift $(($OPTIND - 1))

# A Log file must be specified on the command line
LOGFILE="$*"
[ ${#LOGFILE} -gt 0 ] || exit 1

# Allow the user to schedule Log file cycling through CRON - LOG will be cycled when a HUP is sent to this script.
[ $MANUAL_LOGROTATING == 1 ] && trap NewLog HUP

NewLog "$LOGFILE"

while read LINE
do
   echo "$LINE" >> $TEMPLINK
   if [ $MANUAL_LOGROTATING == 0 ]
   then
      # LOGSIZE - always add 1 to total for each line to account for EOL character.  This obviously only applies to
      # UNIX EOL environments.  Windows would be +2.
      LOGSIZE = $(( ${LOGSIZE} + ${#LINE} + 1 ))
      LOGLINES = $(( ${LOGLINES} + 1 ))
      if [ $MAXLINES ] {
         [ $LOGLINES -gt $MAXLINES ] && NewLog "$LOGFILE"
      } else {
         [ $LOGSIZE -gt $MAXBYTES ] && NewLog "$LOGFILE"
      }
    fi
done
0
 
LVL 10

Expert Comment

by:Nukfror
ID: 17921311
Change the while loop to this (I forgot to add colons to mark the options as expecting arguments and you forgot to add a switch indicator for the "m" switch you added":

<snip>
MANUAL_LOGROTATING=0
while getopts "b:l:m" OPT; do
        case $OPT in
                b ) MAXBYTES=$OPTARG;;
                l  ) MAXLINES=$OPTARG;;
                m ) MANUAL_LOGROTATING=1
                \? ) print "$0 usage error
<snip>
0
 
LVL 3

Expert Comment

by:jhartzen
ID: 17922436
Serves us right for not testing :-)
0
 

Author Comment

by:integrosys
ID: 17962124
>> And why I'm putting this much effort into a question worth 50 points I'll never know ...

I am quite impress by the enthusiam shown by you guys. But I don't need something that complicated. I just need something simple, and it does not need to be perfect. I am not even going to try out your solutions in the log file management, because I have more important tasks to attend to. Sorry if that disappoints you! I would love to give everyone a lot of points, but my available points are now quite limited because I have already stopped the subscription. Whatever available points I have now would be saved for any question I need to ask in the future. But I will still increase the points to 100, and split it between jhartzen and Nukfror for your enthusiam and commitment to this question. Thank you guys, and Tintin, for your participations.
0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

My previous tech tip, Installing the Solaris OS From the Flash Archive On a Tape (http://www.experts-exchange.com/articles/OS/Unix/Solaris/Installing-the-Solaris-OS-From-the-Flash-Archive-on-a-Tape.html), discussed installing the Solaris Operating S…
Installing FreeBSD… FreeBSD is a darling of an operating system. The stability and usability make it a clear choice for servers and desktops (for the cunning). Savvy?  The Ports collection makes available every popular FOSS application and packag…
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now