Link to home
Start Free TrialLog in
Avatar of jayatallen
jayatallen

asked on

prevent shell script running twice on linux 5 redhat

Hi Folks,

I have a ksh shell script named "appcrt" which takes agruments from commandline and runs  managed servers. This script first checks if the process is already running or not. If already running its exit out. This script incorporates while true to start the managed servers. So if the managed servers crashes the script (which was used to start the managed server) will start it again.
This script takes 3 argument. First env. ,then managed server name and then action(which is start/stop)

I have one admin and 3 managed servers.So, if i need to start the admin server i would type
$appcrt prd jasdom_a1 start

So, the script goes till the RUNNING command (please see the script) and check if "appcrt prd jasdom_a1 start" is already running by check the count of ps -ef | grep -c "appcrt prd jasdom_a1 start" and if its greater than 1 then exit otherwise restart the managed server.

This script has been running fine on solaris10 and but i trying to get it working on Redhat linux5 and causing issues.
For somereason, on linux when i run the command to start the admin server using:
$appcrt prd jasdom_a1 start
bash-3.2$ ./appcrt prd jasdom_a1 start
jasadm 25042 18332 0 09:38 pts/0 00:00:00 /bin/ksh ./appcrt prd jasdom_a1start
jasadm 25794 25042 0 09:38 pts/0 00:00:00 /bin/ksh ./appcrt prd jasdom_a1 start

It returns 2 process for the RUNNING check>for debugging im echo RUNNING output. while on solaris it returns 1 which we expect.

Could you please help find the issue here?

SCRIPT:
#! /bin/ksh
umask 022

PATH=/usr/local/bin:/bin:/usr/bin
RUNAS=jasadm
APP=jas
HOSTNAME=`/bin/hostname`;

while getopts :d arg1
do
  case $arg1 in
      d) DEBUG=1;;
  esac
done
shift OPTIND-1

if test -n "$1" && test -n "$2" && test -n "$3" ; then
    ENV_NAME=$1
    INSTANCE=$2
    ACTION=$3
else
    echo Usage:
    echo
    echo "      $0 [-d] [environment] [instance] [start|stop|restart] "
    echo
    echo Some examples:
    echo
    echo "      $0 prd jasdom_a1 start"
    echo "      $0 prd jasdom_m1 start"
    echo "      $0 prd jasdom_m2 start"
    exit 1
fi

DOMAIN=$(print $INSTANCE|awk -F\_ '{print $1}')
JAVA_HOME=/apps/jas/prd/wl9config/jdk150_12
BEA_HOME=/apps/jas/prd/wl9config/
WL_HOME=$BEA_HOME/weblogic92
CONFIG_HOME=/apps/$APP/$ENV_NAME/wl9config/$DOMAIN


JVM="java"
JVM_TYPE="-hotspot"
JVM_TYPE="-server"
JVM_MEM="-ms128m -mx128m -XX:MaxPermSize=32m -XX:NewSize=32m"



CP=$WL_HOME/server/lib/weblogic_sp.jar:$WL_HOME/server/lib/weblogic.jar:$WL_HOME/server/lib/webservices.jar
POST_CP=$JAVA_HOME/lib/tools.jar

CLASSPATH=$CP:$POST_CP

case "$ENV_NAME" in
qa)
;;
prd)
;;
esac


PATH=$WL_HOME/server/bin:$JAVA_HOME/jre/bin:$JAVA_HOME/bin:$PATH
STARTMODE=true
WLS_USER=weblogic
WLS_PW=weblogic1
export WLS_USER WLS_PW STARTMODE PATH CLASSPATH LD_LIBRARY_PATH
export ENV_NAME ENV_HOME INSTANCE
export BEA_HOME JAVA_HOME L


ulimit -n 1024


case "$INSTANCE" in

jasdom_a1)
    PORT=7210
    JVM_MEM="-ms1024m -mx1024m"
    SERVER_TYPE=admin
    HOST=prdcd1-jaswap01.svr.us.xcrom.net
;;
jasdom_m1)
    PORT=7211

JVM_MEM="-ms2048m -mx2048m
    ADMINURL=prdcd1-jaswap01.svr.us.xcrom.net:7210
    SERVER_TYPE=managed
    HOST=prdcd1-jaswap01.svr.us.xcrom.net

;;
jasdom_m2)
    PORT=7212
#JVM_MEM="-ms1024m -mx1024m
    JVM_MEM="-ms1024m -mx1024m"
    ADMINURL=prdcd1-jaswap01.svr.us.xcrom.net:7210
    SERVER_TYPE=managed
    HOST=prdcd1-jaswap01.svr.us.xcrom.net
;;
jasdom_m5)
    PORT=7213
JVM_MEM="-ms2048m -mx2048m
    ADMINURL=prdcd1-jaswap01.svr.us.xcrom.net:7210
    SERVER_TYPE=managed
    HOST=prdcd1-jaswap01.svr.us.xcrom.net

;;
*)
    echo $0: Error: Unknown environment/application combination.
    exit 1
esac

JAVA_STOP_COMMAND="$JVM weblogic.Admin -url $HOST:$PORT FORCESHUTDOWN -username $WLS_USER -password $WLS_PW"
JAVA_PING_COMMAND="$JVM weblogic.Admin -url $ADMINURL  -username $WLS_USER -password $WLS_PW ping "

case $ACTION in
start)

            cd $WL_HOME
     STRING2ADD=" "

      # Special checks for managed instances
      #
    if [ "$SERVER_TYPE" = "managed" ]
    then
        STRING2ADD=" -Dweblogic.management.server=${ADMINURL}"

        RS=$($JAVA_PING_COMMAND 2>&1)

        if [ $(print $RS | grep -c "RTT = ") -eq 0 ]
        then
                print "
                        ==================================
                        Error : Admin Server unavailable.
                        ==================================
                        Status          : Admin Server UNREACHABLE
                        Action          : Aborting this script...
                        ========================================================="
                exit
        fi

    else
        STRING2ADD=" "
    fi

    RUNNING=`/bin/ps -ef | /bin/egrep -c "appcrt[ \t]+(-d[ \t]+)?+$ENV_NAME[ \t]+$INSTANCE[ \t]+start"`
    RUNNING1=`/bin/ps -ef | /bin/egrep  "appcrt[ \t]+(-d[ \t]+)?+$ENV_NAME[ \t]+$INSTANCE[ \t]+start"`
    echo $RUNNING1
    if [ "$RUNNING" -gt 1 ]; then
        echo $0: The start script for this server is still running,
        echo $0: and will restart weblogic automatically if it exits.
    else
        echo WebLogic output redirected to $WL_OUT
        (
        while true
        do
               
                WL_ARGS="$JVM_TYPE -showversion $JVM_MEM -classpath $CLASSPATH \
                $DEBUG_ARGS \
                $WL_OPTION \
                -client
                -verbose:gc \
                -XX:+PrintGCTimeStamps \
                -XX:+PrintGCDetails \
                  -XX:SurvivorRatio=8 \
                -XX:CompileThreshold=8000 \
                -XX:PermSize=48m \
                -XX:MaxPermSize=128m \
                -Xverify:none \
                -da \
                -Dibportal.version=$ENV_NAME \
                -Dibportal.logDir="$LOG_HOME/" \
                -Djava.awt.headless=true \
                -Duser.home=$WEBAPP_HOME/WEB-INF/config \
                -Dweblogic.RootDirectory=$CONFIG_HOME\
                -Dweblogic.Name=$INSTANCE \
                -Dbea.home=$BEA_HOME \
                -Dweblogic.management.username=$WLS_USER \
                -Dweblogic.management.password=$WLS_PW  $STRING2ADD\
                -Dweblogic.ProductionModeEnabled=$STARTMODE \
                -Djava.security.policy=$WL_HOME/server/lib/weblogic.policy \
                -Dplatform.home=/apps/jas/prd/wl9config/weblogic92 \
                -Dplatform.home=/apps/jas/prd/wl9config/weblogic92 \
                -Dweblogic.management.discover=true
                 weblogic.Server"

            MESSAGE="`date +'<%b %d, %Y %l:%M:%S %p  %Z>'` <Alert> <startWebLogic.sh> <Starting webLogic $INSTANCE> <$HOSTNAME>"

            nohup  $JVM $WL_ARGS >> /dev/null 2>&1

            RETURN_CODE=$?
            MESSAGE="`date +'<%b %d, %Y %l:%M:%S %p  %Z>'` <Alert> <startWebLogic.sh> <Server exited with code: $RETURN_CODE>"
            sleep 3
        done
        )&
    fi
    cd $CONFIG_HOME
;;
stop)
    SCRIPT_PID=`/bin/ps -ef | /bin/egrep "appcrt[ \t]+(-d[ \t]+)?$ENV_NAME[ \t]+$INSTANCE[ \t]+start" | /usr/bin/perl -e 'print (( split /\s+/, <>)[1])'`
    if [ -n "$SCRIPT_PID" ]; then
        kill -TERM $SCRIPT_PID > /dev/null 2>&1
        RETURN_CODE=$?
    fi
    echo $PORT
    echo $HOST
    $JAVA_STOP_COMMAND
;;
*)
    echo $0: Error: Action "$ACTION" is not supported.
esac




Thank you,
Avatar of arnold
arnold
Flag of United States of America image

You should use pid files /var/run/instance.pid
Check for the absence of the file prior to starting the instance.

Is the instance part of the data in the ps -ef | grep jasdom?
Avatar of jayatallen
jayatallen

ASKER

instance part is jasdom_a1
If i chose to start managed server1, i would use
$appcrt prd jasdom_m1 start

then INSTANCE would be jasdom_m1.
I wonder why its not working on linux,its been running on solaris and a reliable script.
is it possible to find why  ps -ef | grep "appcrt prd $INSTANCE start" returns 2 or why the shell script being run twice only i ran it only once.
What shell are you running this under in Solaris?
ASKER CERTIFIED SOLUTION
Avatar of simon3270
simon3270
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ksh. i tried that on linux too.
i mean i changed my shell using
$/bin/ksh
then enter
and ran the script. Still 2 process o/p. if you iam echoing RUNNING above and it show 2 processes.
one of them is child process.
CODE FROM SCRIPT:
RUNNING=`/bin/ps -ef | /bin/egrep -c "appcrt[ \t]+(-d[ \t]+)?+$ENV_NAME[ \t]+$INSTANCE[ \t]+start"`
    RUNNING1=`/bin/ps -ef | /bin/egrep  "appcrt[ \t]+(-d[ \t]+)?+$ENV_NAME[ \t]+$INSTANCE[ \t]+start"`
    echo $RUNNING1

result when i run the script:
bash-3.2$ ./appcrt prd jasdom_a1 start
jasadm 25042 18332 0 09:38 pts/0 00:00:00 /bin/ksh ./appcrt prd jasdom_a1start jasadm 25794 25042 0 09:38 pts/0 00:00:00 /bin/ksh ./appcrt prd jasdom_a1 start

i dont understand why shell executed the script twice.
thank you Simon..

As you can see, the script calls while true and starts the managed server,if managed server crashes for some reason,"appcrt prd INSTANCE start" will start managed sever again. If i want to stop managed server then i will stop it using "appcrt prd INSTANCE stop" so it will kill the appcrt script and managed server.

i have tried your suggestion when  i have script  check for
  if [ "$RUNNING" -gt 2 ]; then

The problem is as i have 2 appcrt  processes running (after i run the appcrt script),both processes starts 1 seperate managed server.So the end result is i end up having 2 managed server process.

One more thing after few secs, one of appcrt goes away hence 1 managed server also.
But this is not reliable.
You can probably leave the $RUNNING check as "-gt 1", since that will catch any time when the start script is actually running (given that the script has two processes in the process list).

if you do end up with two managed processes, the second one probably dies because it is trying to use a resource (e.g. listen on an IP address) whcih the first one is already using.
Hi simon,

thanks for your reply.
If i leave
$RUNNING check as "-gt 1"
then the script will exit out as in linux it creating 2 processes for itself.
Is there way to stop shell  creating child process of its own?
I mean when i type "appcrt  prd  jasdom_a1 start" and then do ps -ef | grep  "appcrt  prd  jasdom_a1 start"  should spit out only 1 process in  output.
You would need to add logic to your script which can be simple since you already piping the data from the grep to perl, you can use the perl script to only output the line where the child process is.

Why do you not send the application into the background, or if you want it to be restarted on exit, enclose it in an infinite while loop.

while (true) ; do
#Do logic and start the application
#as long as the start process does not go into the background, the process #will be running.  As soon as the application exits, crashes, the process will #move along will restart at the top of the queue.

done
   
Sorry, yes, I confused myself - you do need "-gt 2", since when the first command runs, $RUNNING will be 2, but when the second one runs it will be 4.
thank you guys for your reply. i think i didnt state my question clearly
Arnold:
i am using appcrt to start another java processes (JVM) . appcrt takes argument and works accordingly.
when i pass the argument (basically the name of the JVM and action) appcrt checks if the already running or not, if not then it starts the given JVM and puts that JVM in background.
if you start in case statement:

            RETURN_CODE=$?
            MESSAGE="`date +'<%b %d, %Y %l:%M:%S %p  %Z>'` <Alert> <startWebLogic.sh> <Server exited with code: $RETURN_CODE>"
            sleep 3
        done
        )&

&..appcrt starts JVM in background.
On solaris,it works perfect.
If i starts a managed server (JVM) on solaris , i would end up having two process:
1) appcrt prd <INSTANCE> start
2) java process ,which was started by above process in background
If for somereason, JVM crashes , process 1 will start it again.becuase of while true loop.

So, if i want to stop/kill the JVM , i would use appcrt with stop (action) to stop both.
This logic has been working fine.
The only problem is on linux,because when i starts the script, Linux creates 2 similar processes .

For instance, suppose nothing is running on linux box and i want to start a managed server. I would type
$appcrt prd jasdom_m1 start

Since nothing was running the o/p for
ps -ef | grep "appcrt prd jasdom_m1 start"  should be 1. as this is only script will proceed further and would start the managed server(JVM) in background.

ISSUE:
On linux, check for ps -ef | grep "appcrt prd jasdom_m1 start" returns 2 (Main problem)
Logic in Script:
CODE FROM SCRIPT:
RUNNING=`/bin/ps -ef | /bin/egrep -c "appcrt[ \t]+(-d[ \t]+)?+$ENV_NAME[ \t]+$INSTANCE[ \t]+start"`
    RUNNING1=`/bin/ps -ef | /bin/egrep  "appcrt[ \t]+(-d[ \t]+)?+$ENV_NAME[ \t]+$INSTANCE[ \t]+start"`
    echo $RUNNING1

For my own understanding, iam echoing RUNNING .I dont understand linux shows below o/p saying 2 appcrt process is running.

result when i run the script:
bash-3.2$ ./appcrt prd jasdom_a1 start
jasadm 25042 18332 0 09:38 pts/0 00:00:00 /bin/ksh ./appcrt prd jasdom_a1start jasadm 25794 25042 0 09:38 pts/0 00:00:00 /bin/ksh ./appcrt prd jasdom_a1 start

So, the problem is linux shelll running the script twice.? I dont know why its doing that

Please suggest.
It is not running it twice.  One "appcrt prd jasdom_a1start jasadm" is the call to run a shell script, the other is the script itself being processed.  You will see that the Parent Process ID (the thrid column of "ps" output) is the same number as the Process ID (the second column) of the other entry - one is the parent of the other.

That's just the way Linux does it.
is there way to stop shell to create a child shell?
i tried to run the script in one shell rather than forking a new child shell like this
$. ./appcrt jasdom_a1 start

dot white space and then script.
this causing weird behavior. Shell starts and terminates the process and keep doing it until i kill the ksh process id which was used to start the script.
Rather than try to work around this, just modify your script to accept the way Linux works.

If you need the same script to work on Linux and Solaris, set a variable to 2 if uname reports Linux, and 1 otherwise. Then compare $RUNNING against that.
I've requested that this question be deleted for the following reason:

no specific answer was provided. it will confuse others
It's not confusing, just different.  If you try to assume that all systems are the same (e.g. Solaris and Linux), you will be bitten by this and other differences.

The solution is to accept that Linux creates two processes and code for that (by changing to "-gt 2", and the change in the "stop" section).
hi simon,

didnt mean to offend you .but that was the issue. if i have gt 2 , i will end up having two processes. i dint find any answer how i can make this script to execute one process only.
the only way to make it work is to take while true section out,but then the script is no more good to start the process automatically.

The suggestion provided helps but doesnt eliminate the orginal issue.
You can only eliminate the issue by eliminating one of the environments.

The solution I employed in similar situation is as following:have two different configs, one per Solaris, one per Linux, in which you may have
MAX_PROCESSES=2

Open in new window

on Linux
MAX_PROCESSES=1

Open in new window

on Solaris
source it via
. config.sh

Open in new window

and use
if $RUNNING -gt $MAX_PROCESSES

Open in new window


But the healthiest way remains to use PID files.
The script is running two process because that's the way Linux organises it.  As you suggest, the while loop seems to be the trigger for this.

The two processes are not simply two versions of the same program running - they are a parent+child pair, so that one runs the script itself, and the other looks after the backgrounded while loop.
PID files could, a parparov suggest, be a more reliable way of finding the original program, but they do suffer from the problem that the PID alone does not identify a process.  For example, a process creates it PID file, runs for a very long time, then crashes leaving its PID file still present.   Since the PID is generated from a limited set of values (from 1 to 32768 on Linux), it is possible that the PID has wrapped round, and the same PID has been allocated to some new process, entirely unrelated to the original one.  This may seem like a theoretical problem, but I have been bitten by it in the past (on a system doing enormous numbers of compilations, where the PID wrapped round twice per day).