Edgar Cole
asked on
There's an infinite loop in the application startup script on my PowerHA cluster!?
The application startup script in /opt/PowerHA/bin on my cluster node looks like this:
#!/bin/ksh
#stub script to start the Lawson application
banner "Rebuilding print queues"
/usr/local/bin/rebuildqueu es
banner "Starting Lawson Application"
while true
do
date > /tmp/LawsonStart.out 2>&1
sleep 300
done
If I’m not mistaken, the ‘while’ loop creates an infinite loop. I don’t see anything within that loop that will terminate it. I don’t know what in the failover process might depend on the startup script, but I don’t want to risk HACMP waiting indefinitely for it to finish. Do I have a legitimate concern?
#!/bin/ksh
#stub script to start the Lawson application
banner "Rebuilding print queues"
/usr/local/bin/rebuildqueu
banner "Starting Lawson Application"
while true
do
date > /tmp/LawsonStart.out 2>&1
sleep 300
done
If I’m not mistaken, the ‘while’ loop creates an infinite loop. I don’t see anything within that loop that will terminate it. I don’t know what in the failover process might depend on the startup script, but I don’t want to risk HACMP waiting indefinitely for it to finish. Do I have a legitimate concern?
ASKER
I believe that the loop was implemented to wait for other processes to finish. Those processes have since been removed. That's why the script is so brief.
I was not involved in the development of this script, and just happened to notice the loop when I went to modify it. I don't remember that statement being in previous production versions. I've pointed out to my colleagues what the effect of that statement is, but they told me that's how it was tested. Whatever that loop was originally intended to do, it probably didn't. I have recommended that it be removed. I will forward your comments to them.
The rebuildqueues, on the other hand, is mine. It does not run in the background, and if it did, I would have used the wait command instead.
Once again, thank you
Edgar
I was not involved in the development of this script, and just happened to notice the loop when I went to modify it. I don't remember that statement being in previous production versions. I've pointed out to my colleagues what the effect of that statement is, but they told me that's how it was tested. Whatever that loop was originally intended to do, it probably didn't. I have recommended that it be removed. I will forward your comments to them.
The rebuildqueues, on the other hand, is mine. It does not run in the background, and if it did, I would have used the wait command instead.
Once again, thank you
Edgar
One remark to the "config_too_long" event:
There is an option in HACMP to configure the "Application Startup Mode"
as "background" instead of "foreground".
If, and only if, you set this option to "background" the "node_up" event will not be delayed, thus the "config_too_long" event will not be triggered, thus there will be no such messages as I described above.
Your start script will run in background forever and ever, but the cluster will treat it as if it had been successful.
There is an option in HACMP to configure the "Application Startup Mode"
as "background" instead of "foreground".
If, and only if, you set this option to "background" the "node_up" event will not be delayed, thus the "config_too_long" event will not be triggered, thus there will be no such messages as I described above.
Your start script will run in background forever and ever, but the cluster will treat it as if it had been successful.
ASKER
Hmm.., not finding it.
ASKER CERTIFIED SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
ASKER
Yep. It might not be included in version 5.5. I got an error when I tried running SMIT with that fastpath.
the cluster startup will never come to an end.
The startup script is the very last step in resource group initialization,
so all the cluster functionality will be available nonetheless,
but the cluster manager will not consider your application as being "up".
This, in turn, will lead to countless messages in hacmp.out like
"Cluster has been running event xxx for nnn seconds",
where xxx can be "reconfig_resource_complet
or "rg_move_complete" or similar.
(That's the "config_too_long" event).
Short, the infinite loop in the script makes no sense at all.
Could it be that the "rebuildqueues" process forks itself into background and the loop is meant to wait for it to finish?
If so, you should try to find out how to programmatically check for its completion, so you can add a reasonable break condition.
By the way, is this "rebuildqueues" the only thing that's needed to start the application, or is there more in the script than what you posted?