Edgar Cole
asked on
Running a background process from a Korn shell function
I have a Korn shell script containing the following code:
setAlarm() {
set -x
( sleep $TIMEOUT; kill -USR1 $$ ) &
echo=$!
return
}
This function is called in the following statement:
[ "$TIMEOUT" ] && ALARM=`setAlarm $TIMEOUT`
When I run the script with the debug flag, I get the following:
+ [ 60 ]
+ + setAlarm 60
+ sleep 60
+ echo=11075826
+ kill -USR1 7733266
### SIXTY SECOND PAUSE ###
ALARM=
+ soundAlarm
+ /home/155477/NetBackup/pre -exec 120 0
The problem is that between the statement "kill -USR1 7733266" and "ALARM=," there's a sixty second pause. My expectation is that the ALARM variable assignment would happen immediately after the setAlarm function returns. Not only is that statement not executed when expected, but the assignment is null!? The setAlarm function returns the PID of the command group that it runs in the background. It is that value that should get assigned to the variable named ALARM. Not only does the code wait for a function that appears to return immediately, but the value that function returns is lost.
setAlarm() {
set -x
( sleep $TIMEOUT; kill -USR1 $$ ) &
echo=$!
return
}
This function is called in the following statement:
[ "$TIMEOUT" ] && ALARM=`setAlarm $TIMEOUT`
When I run the script with the debug flag, I get the following:
+ [ 60 ]
+ + setAlarm 60
+ sleep 60
+ echo=11075826
+ kill -USR1 7733266
### SIXTY SECOND PAUSE ###
ALARM=
+ soundAlarm
+ /home/155477/NetBackup/pre
The problem is that between the statement "kill -USR1 7733266" and "ALARM=," there's a sixty second pause. My expectation is that the ALARM variable assignment would happen immediately after the setAlarm function returns. Not only is that statement not executed when expected, but the assignment is null!? The setAlarm function returns the PID of the command group that it runs in the background. It is that value that should get assigned to the variable named ALARM. Not only does the code wait for a function that appears to return immediately, but the value that function returns is lost.
<< a function that appears to return immediately >>
The function if called "alone" would indeed return immediately, but might not be able to find the process to kill with USR1 (this parent process might well have ended in the meantime).
When used in a variable assignment via command substitution the assignment can only take place when everything which is needed for it has finished.
The shell has to wait for everything inside the function to finish in order to be able to decide what the full output of the function might be.
The function if called "alone" would indeed return immediately, but might not be able to find the process to kill with USR1 (this parent process might well have ended in the meantime).
When used in a variable assignment via command substitution the assignment can only take place when everything which is needed for it has finished.
The shell has to wait for everything inside the function to finish in order to be able to decide what the full output of the function might be.
ASKER
Oops! I spoke too soon! The statement called by the function is still not behaving as a concurrent process. In the debug output, it pauses right after the return statement. I know I've seen this work, but I wrote that code quite a few years ago. I'm pretty sure I can access it, but I'll have to wait until I get home.
setAlarm() {
set -x
( sleep $1; kill -USR1 $$ ) &
echo $!
return
}
+ echo 15:05:19 [shtest_25Oct13] StreamNumber1 has not finished
+ 1>> /home/155477/NetBackup/log s/Just.Tes ting.FULL/ queue.1025 13
+ [[ -n 15 ]]
+ + setAlarm 15
+ sleep 15
+ echo 8585372
+ return
+ kill -USR1 7733408
ALARM=8585372
+ soundAlarm
+ /home/155477/NetBackup/pre -exec 30 0
+ 1> /dev/null 2>> /home/155477/NetBackup/log s/Just.Tes ting.FULL/ log.102513
Hmm. I've got another idea!
setAlarm() {
set -x
( sleep $1; kill -USR1 $$ ) &
echo $!
return
}
+ echo 15:05:19 [shtest_25Oct13] StreamNumber1 has not finished
+ 1>> /home/155477/NetBackup/log
+ [[ -n 15 ]]
+ + setAlarm 15
+ sleep 15
+ echo 8585372
+ return
+ kill -USR1 7733408
ALARM=8585372
+ soundAlarm
+ /home/155477/NetBackup/pre
+ 1> /dev/null 2>> /home/155477/NetBackup/log
Hmm. I've got another idea!
ASKER
Okay. I had to replace this...
ALARM=`setAlarm $TIMEOUT`
with...
setAlarm $TIMEOUT
ALARM=$!
I think it's working now.
ALARM=`setAlarm $TIMEOUT`
with...
setAlarm $TIMEOUT
ALARM=$!
I think it's working now.
This works when the function contains
return $!
because
echo $!
doesn't make sense anymore then.
return $!
because
echo $!
doesn't make sense anymore then.
ASKER
Hmm. Isn't the return statement updating the status register, and therefor must be between zero and two hundred and fifty-five?
Yep, you're right, in two ways.
First, of course, you can only return values between 0 and 255. Seems I'm getting too old (or demented) ...
Second, I obviously misread your solution, by assuming "$?" instead of "$!" - or did you edit the comment?? Anyway, since "$!" doesn't get overwritten by any other subprocess you can well refer to this variable outside of the function, so your solution will indeed work. Good job!
But there's still the problem that the process to kill with USR1 (the function's parent process $$) might well have ended in the meantime, so the signal handler for USR1 would never fire.
The script must survive for more than TIMEOUT seconds (by doing other work?) after the function call to get the signal handled.
wmp
First, of course, you can only return values between 0 and 255. Seems I'm getting too old (or demented) ...
Second, I obviously misread your solution, by assuming "$?" instead of "$!" - or did you edit the comment?? Anyway, since "$!" doesn't get overwritten by any other subprocess you can well refer to this variable outside of the function, so your solution will indeed work. Good job!
But there's still the problem that the process to kill with USR1 (the function's parent process $$) might well have ended in the meantime, so the signal handler for USR1 would never fire.
The script must survive for more than TIMEOUT seconds (by doing other work?) after the function call to get the signal handled.
wmp
ASKER
If I'm following you correctly, I have made provisions for terminating the background job when the parent exits. There's a trap command that I did not include in the sample code.
A trap command for USR1 will have nothing to trap (the kill -USR1 && will have no target) if && (the PID of the shell calling the function) has exited during the TIMEOUT period.
I assume there is also a "trap ... EXIT" (or "trap ... 0"), or how did you make it work?
I assume there is also a "trap ... EXIT" (or "trap ... 0"), or how did you make it work?
ASKER
I did find my 10-year-old code this weekend. It's residing on a Sun Ultra5 running Solaris 8!
Anyway, to answer your question, here's the statement:
The cancelAlarm function looks like this:
Anyway, to answer your question, here's the statement:
trap "[ \"\$ALARM\" ] && ps -p \$ALARM >/dev/null 2>&1 && cancelAlarm" EXIT
The cancelAlarm function looks like this:
cancelAlarm() {
kill -KILL $ALARM
}
OK, makes sense.
As I guessed, there is an EXIT trap whose action consists of first testing the variable $ALARM for not being NULL, then checking "ps" for a process whose PID is stored in $ALARM and, if both tests succeed, calling cancelAlarm().
cancelAlarm() kills $ALARM which is the PID of the background process started by the function setAlarm() as soon as the main script exits (because it's initiated by an EXIT trap).
Nice work!
As I guessed, there is an EXIT trap whose action consists of first testing the variable $ALARM for not being NULL, then checking "ps" for a process whose PID is stored in $ALARM and, if both tests succeed, calling cancelAlarm().
cancelAlarm() kills $ALARM which is the PID of the background process started by the function setAlarm() as soon as the main script exits (because it's initiated by an EXIT trap).
Nice work!
ASKER
Except...
The trap for the USR1 signal looks like this...
The evidence is that the parent sees the signal, but it doesn't execute the corresponding function until the parent process exits. That defeats the entire purpose because the alarm is intended to signal that the parent process is running long. The soundAlarm function looks like this...
I need the soundAlarm function invoked as soon as the parent gets the USR1 signal. Either the parent is ignoring all signals until it terminates, or it's postponing execution of the function until it terminates. I know it's receiving the signal, because the soundAlarm function is executed when the parent terminates.
The trap for the USR1 signal looks like this...
trap soundAlarm USR1
The evidence is that the parent sees the signal, but it doesn't execute the corresponding function until the parent process exits. That defeats the entire purpose because the alarm is intended to signal that the parent process is running long. The soundAlarm function looks like this...
soundAlarm() {
echo $ALARM_MESSAGE | mailx -s "$ALARM_SUBJECT" $RECIPIENTS
}
I need the soundAlarm function invoked as soon as the parent gets the USR1 signal. Either the parent is ignoring all signals until it terminates, or it's postponing execution of the function until it terminates. I know it's receiving the signal, because the soundAlarm function is executed when the parent terminates.
I'm not sure if I can understand all implications.
Could you please have a look at the following test piece?
If the parent runs longer than the function (MAINDELAY>TIMEOUT) it runs the "USR1" handler just fine and timely (after TIMEOUT has passed) , but the EXIT handler doesn't find anything to kill (as expected).
If the function runs longer than the parent (MAINDELAY<TIMEOUT) it runs the "EXIT" handler timely (kills what's running in the function) , but never reaches the USR1 handler (also as expected).
Could you please have a look at the following test piece?
If the parent runs longer than the function (MAINDELAY>TIMEOUT) it runs the "USR1" handler just fine and timely (after TIMEOUT has passed) , but the EXIT handler doesn't find anything to kill (as expected).
If the function runs longer than the parent (MAINDELAY<TIMEOUT) it runs the "EXIT" handler timely (kills what's running in the function) , but never reaches the USR1 handler (also as expected).
#!/usr/bin/ksh
set +xv
trap usr1_handler USR1
trap exit_handler EXIT
usr1_handler() {
echo in USR1 signal handler
echo $ALARM
}
exit_handler() {
echo in EXIT signal handler
kill $ALARM
}
setAlarm() {
set +xv
echo in function
( for i in $(seq 1 $1); do echo "F$i \c"; sleep 1; done; kill -USR1 $$ ) &
echo $!
return
}
MAINDELAY=30
TIMEOUT=10
setAlarm $TIMEOUT
ALARM=$!
echo in main
echo $ALARM
for i in $(seq 1 $MAINDELAY); do echo "M$i \c"; sleep 1; done
ASKER
Okay. I took your example and modified it a bit, but I think we've proven that the concept is viable. The problem I'm having is that the solution isn't scalable. Specifically, when I add code to the program, processing of the signal seems to get postponed until the program terminates. Obviously there's something in the code that I'm adding that's interfering with interrupt processing. It's as though the signal is being queued. The difference between the example you devised and the production code is a really gnarly nested 'if' statement - which looks something like this:
if ...; then
.
.
elif ...; then
.
.
else
if <call_to_an_external_script _goes_here >; then
.
.
else
.
.
fi
fi
When the logic falls through the if statement, the program is done; which appears to be when my signal finally gets processed. I wonder if it would make a difference if I changed the conditional as follows:
<call_to_an_external_script _goes_here >
if (( $? == 0 )); then
.
.
else
.
.
fi
if ...; then
.
.
elif ...; then
.
.
else
if <call_to_an_external_script
.
.
else
.
.
fi
fi
When the logic falls through the if statement, the program is done; which appears to be when my signal finally gets processed. I wonder if it would make a difference if I changed the conditional as follows:
<call_to_an_external_script
if (( $? == 0 )); then
.
.
else
.
.
fi
I think you still not showing the full picture.
Let's say we create an external script called ext_script reading:
By the way, if you had to modify my example because you don't have the "seq" utility I'd strongly suggest installing the GNU "coretutils" package from Michael Perzl's collection. You won't regret it!
Let's say we create an external script called ext_script reading:
#!/bin/ksh
for i in $(seq 1 $1); do echo "F$i \c"; sleep 1; done;
and we modify the original script like this (line 16):
#!/usr/bin/ksh
set +xv
trap usr1_handler USR1
trap exit_handler EXIT
usr1_handler() {
echo in USR1 signal handler
echo $ALARM
}
exit_handler() {
echo in EXIT signal handler
kill $ALARM
}
setAlarm() {
set +xv
echo in function
( ext_script $1; kill -USR1 $$ ) &
echo $!
return
}
MAINDELAY=30
TIMEOUT=10
setAlarm $TIMEOUT
ALARM=$!
echo in main
echo $ALARM
for i in $(seq 1 $MAINDELAY); do echo "M$i \c"; sleep 1; done
then the whole thing behaves just the same way as before. A whatever gnarly "if" construct shouldn't change anything in that aspect.By the way, if you had to modify my example because you don't have the "seq" utility I'd strongly suggest installing the GNU "coretutils" package from Michael Perzl's collection. You won't regret it!
ASKER
I've don't think I've ever seen the seq operator. How does that work?
"seq" is not an operator, it's a tool.
It writes to stdout sequential numerical values whose boundaries are specified by the positional parameters (1) start and (2) end (ascending/descending).
If there are three parameters then the second value is interpreted as the increment which will otherwise default to "1".
"seq 1 10" will show
1
2
3
4
5
6
7
8
9
10
"seq 10 1" will show
10
9
8
7
6
5
4
3
2
1
"seq 1 2 10" will show
1
3
5
7
9
and finally "seq 10 -2 1" will show
10
8
6
4
2
Thus
for i in $(seq 1 10)
is the same as
for i in 1 2 3 4 5 6 7 8 9 10
The nice thing is that you don't have to know the boundaries in advance - just specify a variable/variables instead of the parameter(s).
It writes to stdout sequential numerical values whose boundaries are specified by the positional parameters (1) start and (2) end (ascending/descending).
If there are three parameters then the second value is interpreted as the increment which will otherwise default to "1".
"seq 1 10" will show
1
2
3
4
5
6
7
8
9
10
"seq 10 1" will show
10
9
8
7
6
5
4
3
2
1
"seq 1 2 10" will show
1
3
5
7
9
and finally "seq 10 -2 1" will show
10
8
6
4
2
Thus
for i in $(seq 1 10)
is the same as
for i in 1 2 3 4 5 6 7 8 9 10
The nice thing is that you don't have to know the boundaries in advance - just specify a variable/variables instead of the parameter(s).
ASKER
Hmm. It would appear that seq is not among the tools I have at my disposal...
ksh: seq: not found
Is it a shell built-in or a transient command?
ksh: seq: not found
Is it a shell built-in or a transient command?
As I said above, it's in the "coreutils" RPM available at http://www.perzl.org or in the AIX toolbox.
ASKER
So, here is the piece of code I've been testing with:
COUNTDOWN=20
while (( COUNTDOWN > 0 )); do
echo Still running...
sleep 1
(( COUNTDOWN -= 1 ))
done
If I run this code from the parent process, the signal is acknowledged when it is sent...In main function...
in setAlarm function
Back in main function...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
in USR1 signal handler
Received USR1 at 11:25:15 ...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
+ echo in EXIT signal handler
+ 1>& 2
in EXIT signal handler
+ [[ -n 7995498 ]]
+ ps -p 7995498
+ 1> /dev/null 2>& 1
If, on the other hand, I move this code to an external file, the signal is not processed until the parent terminates...In main function...
in setAlarm function
Back in main function...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
in USR1 signal handler
Received USR1 at 11:28:56 ...
+ echo in EXIT signal handler
+ 1>& 2
in EXIT signal handler
+ [[ -n 7995526 ]]
+ ps -p 7995526
+ 1> /dev/null 2>& 1
I've turned the code every which way but loose. I even tried running the external file as an argument to the exec command. I'm going to have to concede defeat on this one.
Seems that you're calling the external script from "main" instead of from inside the function as I assumed.
That makes a big difference, since this call is not executed in the background. The main script has to wait for the external script to return before any signals can be handled.
"exec" cannot help because the whole script will then be overwritten by the external one, all signal handling will be lost.
I think you're right, we'll not get any further here without a bare metal redesign of the entire logic.
That makes a big difference, since this call is not executed in the background. The main script has to wait for the external script to return before any signals can be handled.
"exec" cannot help because the whole script will then be overwritten by the external one, all signal handling will be lost.
I think you're right, we'll not get any further here without a bare metal redesign of the entire logic.
ASKER
Yeah, that exec command thing was pretty desperate, huh? LOL
One idea though - if you run the external script in background and issue two (!) "wait" statements - what happens?
I have this:
#!/usr/bin/ksh
set +xv
trap usr1_handler USR1
trap exit_handler EXIT
usr1_handler() {
echo in USR1 signal handler
echo $ALARM
}
exit_handler() {
echo in EXIT signal handler
kill $ALARM
}
setAlarm() {
set +xv
echo in function
( ext_script $1 F; kill -USR1 $$ ) &
echo $!
return
}
MAINDELAY=30
TIMEOUT=10
setAlarm $TIMEOUT
ALARM=$!
echo in main
echo $ALARM
ext_script $MAINDELAY M &
wait ; wait
and get this:
in function
893148
in main
893148
F1 M1 F2 M2 F3 M3 F4 M4 F5 M5 F6 M6 F7 M7 F8 M8 F9 M9 F10 M10
in USR1 signal handler
893148
M11 M12 M13 M14 M15 M16 M17 M18 M19 M20 M21 M22 M23 M24 M25 M26 M27 M28 M29 M30
in EXIT signal handler
kill: 893148: 0403-003 The specified process does not exist.
I have this:
#!/usr/bin/ksh
set +xv
trap usr1_handler USR1
trap exit_handler EXIT
usr1_handler() {
echo in USR1 signal handler
echo $ALARM
}
exit_handler() {
echo in EXIT signal handler
kill $ALARM
}
setAlarm() {
set +xv
echo in function
( ext_script $1 F; kill -USR1 $$ ) &
echo $!
return
}
MAINDELAY=30
TIMEOUT=10
setAlarm $TIMEOUT
ALARM=$!
echo in main
echo $ALARM
ext_script $MAINDELAY M &
wait ; wait
and get this:
in function
893148
in main
893148
F1 M1 F2 M2 F3 M3 F4 M4 F5 M5 F6 M6 F7 M7 F8 M8 F9 M9 F10 M10
in USR1 signal handler
893148
M11 M12 M13 M14 M15 M16 M17 M18 M19 M20 M21 M22 M23 M24 M25 M26 M27 M28 M29 M30
in EXIT signal handler
kill: 893148: 0403-003 The specified process does not exist.
Forgot to mention - the external script now looks like this to make it more flexible in regard to the displayed prefix:
for i in $(seq 1 $1); do echo "$2$i \c"; sleep 1; done;
for i in $(seq 1 $1); do echo "$2$i \c"; sleep 1; done;
ASKER
Hmm. Interesting. I had tried a single wait statement without success. It appears that two waits might be working. Please enlighten me.
I think I'll need a bit of enlightenment myself.
"wait" without parameter should wait for all subprocesses to terminate, but this doesn't seem to be quite true.
Rather it seems that when the first one of these subprocesses terminates the EXIT handler is triggered, unless there's another "wait".
Instead of two times "wait" without parameter you can also use
wait $ALARM; wait $!
which works as well.
The drawback is that the first "wait" must be the one for the shorter running subprocess, which is of course not known in all cases.
Hmm.
I don't have any plausible explanation for this behaviour. It might be just "normal", but if so, why?
"wait" without parameter should wait for all subprocesses to terminate, but this doesn't seem to be quite true.
Rather it seems that when the first one of these subprocesses terminates the EXIT handler is triggered, unless there's another "wait".
Instead of two times "wait" without parameter you can also use
wait $ALARM; wait $!
which works as well.
The drawback is that the first "wait" must be the one for the shorter running subprocess, which is of course not known in all cases.
Hmm.
I don't have any plausible explanation for this behaviour. It might be just "normal", but if so, why?
ASKER
I guess you can't have it all. The "wait; wait" statement works, but I also need to be able to capture the exit status of the external command. Currently, I'm running the external command in the background like this:
( ext_command && cancelAlarm || ( cancelAlarm && false ) ) &
I've also tried...
( ext_command && cancelAlarm || ( cancelAlarm; STATUS=1 ) ) &
The STATUS variable is a global defined in the parent process, but the background job doesn't seem to be modifying it. Perhaps there's too much nesting.
( ext_command && cancelAlarm || ( cancelAlarm && false ) ) &
I've also tried...
( ext_command && cancelAlarm || ( cancelAlarm; STATUS=1 ) ) &
The STATUS variable is a global defined in the parent process, but the background job doesn't seem to be modifying it. Perhaps there's too much nesting.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
echo $!
instead of "echo=$!"
The delay should occur between "return" and "kill -USR1 ...", but not at all between "kill ..." and the start of your signal handler routine (resp. the debug display of the ALARM variable).
Strange that "return" does not appear in the debug log. Are you sure that you're posting the actual code?
By the way, inside the function "sleep $1" should suffice, since you're passing the timeout value as a parameter to it.
wmp