Link to home
Start Free TrialLog in
Avatar of Edgar Cole
Edgar ColeFlag for United States of America

asked on

Running a background process from a Korn shell function

I have a Korn shell script containing the following code:

setAlarm() {
  set -x
  ( sleep $TIMEOUT; kill -USR1 $$ ) &
  echo=$!
  return
}

This function is called in the following statement:

[ "$TIMEOUT" ] && ALARM=`setAlarm $TIMEOUT`

When I run the script with the debug flag, I get the following:

+ [ 60 ]
+ + setAlarm 60
+ sleep 60
+ echo=11075826
+ kill -USR1 7733266
### SIXTY SECOND PAUSE ###
ALARM=
+ soundAlarm
+ /home/155477/NetBackup/pre-exec 120 0

The problem is that between the statement "kill -USR1 7733266" and "ALARM=," there's a sixty second pause. My expectation is that the ALARM variable assignment would happen immediately after the setAlarm function returns. Not only is that statement not executed when expected, but the assignment is null!? The setAlarm function returns the PID of the command group that it runs in the background. It is that value that should get assigned to the variable named ALARM. Not only does the code wait for a function that appears to return immediately, but the value that function returns is lost.
Avatar of woolmilkporc
woolmilkporc
Flag of Germany image

To populate the variable you should use

echo $!

instead of "echo=$!"

The delay should occur between "return" and "kill -USR1 ...", but not at all between "kill ..." and the start of your signal handler routine (resp. the debug display of the ALARM variable).
Strange that "return" does not appear in the debug log.  Are you sure that you're posting the actual code?

By the way, inside the function "sleep $1" should suffice, since you're passing the timeout value as a  parameter to it.

wmp
<< a function that appears to return immediately >>

The function if called "alone" would indeed return immediately, but might not be able to find the process to kill with USR1 (this parent process might well have ended in the meantime).

When used in a variable assignment via command substitution the assignment can only take place when everything which is needed for it has finished.

The shell has to wait for everything inside the function to finish in order to be able to decide what the full output of the function might be.
Avatar of Edgar Cole

ASKER

Oops! I spoke too soon! The statement called by the function is still not behaving as a concurrent process. In the debug output, it pauses right after the return statement. I know I've seen this work, but I wrote that code quite a few years ago. I'm pretty sure I can access it, but I'll have to wait until I get home.

setAlarm() {
  set -x
  ( sleep $1; kill -USR1 $$ ) &
  echo $!
  return
}

+ echo 15:05:19 [shtest_25Oct13] StreamNumber1 has not finished
+ 1>> /home/155477/NetBackup/logs/Just.Testing.FULL/queue.102513
+ [[ -n 15 ]]
+ + setAlarm 15
+ sleep 15
+ echo 8585372
+ return
+ kill -USR1 7733408
ALARM=8585372
+ soundAlarm
+ /home/155477/NetBackup/pre-exec 30 0
+ 1> /dev/null 2>> /home/155477/NetBackup/logs/Just.Testing.FULL/log.102513

Hmm. I've got another idea!
Okay. I had to replace this...

ALARM=`setAlarm $TIMEOUT`

with...

setAlarm $TIMEOUT
ALARM=$!

I think it's working now.
This works when the function contains

return $!

because

echo $!

doesn't make sense anymore then.
Hmm. Isn't the return statement updating the status register, and therefor must be between zero and two hundred and fifty-five?
Yep, you're right, in two ways.

First, of course, you can only return values between 0 and 255. Seems I'm getting too old (or demented) ...

Second, I obviously misread your solution, by assuming "$?" instead of "$!" - or did you edit the comment?? Anyway, since "$!" doesn't get overwritten by any other subprocess you can well refer to this variable outside of the function, so your solution will indeed work. Good job!

But there's still the problem that the process to kill with USR1 (the function's parent process $$) might well have ended in the meantime, so the signal handler for USR1 would never fire.
The script must survive for more than TIMEOUT seconds (by doing other work?) after the function call to get the signal handled.

wmp
If I'm following you correctly, I have made provisions for terminating the background job when the parent exits. There's a trap command that I did not include in the sample code.
A trap command for USR1 will have nothing to trap (the kill -USR1 && will have no target) if && (the PID of the shell calling the function) has exited during the TIMEOUT period.

I assume there is also a "trap ... EXIT" (or "trap ... 0"), or how did you make it work?
I did find my 10-year-old code this weekend. It's residing on a Sun Ultra5 running Solaris 8!

Anyway, to answer your question, here's the statement:

trap "[ \"\$ALARM\" ] && ps -p \$ALARM >/dev/null 2>&1 && cancelAlarm" EXIT

Open in new window


The cancelAlarm function looks like this:

cancelAlarm() {
  kill -KILL $ALARM
}

Open in new window

OK, makes sense.

As I guessed, there  is an EXIT trap whose action consists of first testing the variable $ALARM for not being NULL, then checking "ps" for a process whose PID is stored in $ALARM and, if both tests succeed, calling cancelAlarm().

cancelAlarm() kills $ALARM which is the PID of the background process started by the function setAlarm() as soon as the main script exits (because it's initiated by an EXIT trap).

Nice work!
Except...

The trap for the USR1 signal looks like this...

trap soundAlarm USR1

Open in new window


The evidence is that the parent sees the signal, but it doesn't execute the corresponding function until the parent process exits. That defeats the entire purpose because the alarm is intended to signal that the parent process is running long. The soundAlarm function looks like this...

soundAlarm() {
  echo $ALARM_MESSAGE | mailx -s "$ALARM_SUBJECT" $RECIPIENTS
}

Open in new window


I need the soundAlarm function invoked as soon as the parent gets the USR1 signal. Either the parent is ignoring all signals until it terminates, or it's postponing execution of the function until it terminates. I know it's receiving the signal, because the soundAlarm function is executed when the parent terminates.
I'm not sure if I can understand all implications.

Could you please have a look at the following test piece?

If the parent runs longer than the function (MAINDELAY>TIMEOUT) it runs the "USR1" handler just fine and timely (after TIMEOUT has passed) , but the EXIT handler doesn't find anything to kill (as expected).

If the function runs longer than the parent (MAINDELAY<TIMEOUT) it runs the "EXIT" handler timely (kills what's running in the function) , but never reaches the USR1 handler (also as expected).

#!/usr/bin/ksh
set +xv
trap usr1_handler USR1
trap exit_handler EXIT

usr1_handler() {
echo in USR1 signal handler
echo $ALARM
}

exit_handler() {
echo in EXIT signal handler
kill $ALARM
}

setAlarm() {
  set +xv
  echo in function
  ( for i in $(seq 1 $1); do echo "F$i \c"; sleep 1; done; kill -USR1 $$ ) &
  echo $!
  return
}

MAINDELAY=30
TIMEOUT=10

setAlarm $TIMEOUT

ALARM=$!
echo in main
echo $ALARM
for i in $(seq 1 $MAINDELAY); do echo "M$i \c"; sleep 1; done

Open in new window

Okay. I took your example and modified it a bit, but I think we've proven that the concept is viable. The problem I'm having is that the solution isn't scalable. Specifically, when I add code to the program, processing of the signal seems to get postponed until the program terminates. Obviously there's something in the code that I'm adding that's interfering with interrupt processing. It's as though the signal is being queued. The difference between the example you devised and the production code is a really gnarly nested 'if' statement - which looks something like this:

if ...; then
    .
    .
elif ...; then
    .
    .
else
    if <call_to_an_external_script_goes_here>; then
        .
        .
    else
        .
        .
    fi
fi

When the logic falls through the if statement, the program is done; which appears to be when my signal finally gets processed. I wonder if it would make a difference if I changed the conditional as follows:

<call_to_an_external_script_goes_here>
if (( $? == 0 )); then
    .
    .
else
    .
    .
fi
I think you still not showing the full picture.

Let's say we create an external script called ext_script reading:
#!/bin/ksh
 for i in $(seq 1 $1); do echo "F$i \c"; sleep 1; done;

Open in new window

and we modify the original script like this (line 16):
#!/usr/bin/ksh
set +xv
trap usr1_handler USR1
trap exit_handler EXIT
usr1_handler() {
echo in USR1 signal handler
echo $ALARM
}
exit_handler() {
echo in EXIT signal handler
kill $ALARM
}
setAlarm() {
  set +xv
  echo in function
  ( ext_script $1; kill -USR1 $$ ) &
  echo $!
  return
}

MAINDELAY=30
TIMEOUT=10
setAlarm $TIMEOUT
ALARM=$!
echo in main
echo $ALARM
for i in $(seq 1 $MAINDELAY); do echo "M$i \c"; sleep 1; done

Open in new window

then the whole thing behaves just the same way as before. A whatever gnarly "if" construct shouldn't change anything in that aspect.

By the way, if you had to modify my example because you don't have the "seq" utility I'd strongly suggest installing the GNU "coretutils" package from Michael Perzl's collection. You won't regret it!
I've don't think I've ever seen the seq operator. How does that work?
"seq" is not an operator, it's a tool.

It writes to stdout sequential numerical values whose boundaries are specified by the positional parameters (1) start and (2) end (ascending/descending).

If there are three parameters then the second value is interpreted as the increment which will otherwise default to "1".

"seq 1 10" will show

1
2
3
4
5
6
7
8
9
10

"seq 10 1" will show

10
9
8
7
6
5
4
3
2
1

"seq 1 2 10" will show

1
3
5
7
9

and finally "seq 10 -2 1" will show

10
8
6
4
2

Thus

for i in $(seq 1 10)

is the same as

for i in 1 2 3 4 5 6 7 8 9 10

The nice thing is that you don't have to know the boundaries in advance - just specify a variable/variables instead of the  parameter(s).
Hmm. It would appear that seq is not among the tools I have at my disposal...

ksh: seq:  not found

 Is it a shell built-in or a transient command?
As I said above, it's in the "coreutils" RPM available at http://www.perzl.org or in the AIX toolbox.
So, here is the piece of code I've been testing with:
COUNTDOWN=20
while (( COUNTDOWN > 0 )); do
  echo Still running...
  sleep 1
  (( COUNTDOWN -= 1 ))
done

Open in new window

If I run this code from the parent process, the signal is acknowledged when it is sent...
In main function...
in setAlarm function
Back in main function...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
in USR1 signal handler
Received USR1 at 11:25:15 ...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
+ echo in EXIT signal handler
+ 1>& 2
in EXIT signal handler
+ [[ -n 7995498 ]]
+ ps -p 7995498
+ 1> /dev/null 2>& 1

Open in new window

If, on the other hand, I move this code to an external file, the signal is not processed until the parent terminates...
In main function...
in setAlarm function
Back in main function...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
Still running...
in USR1 signal handler
Received USR1 at 11:28:56 ...
+ echo in EXIT signal handler
+ 1>& 2
in EXIT signal handler
+ [[ -n 7995526 ]]
+ ps -p 7995526
+ 1> /dev/null 2>& 1

Open in new window

I've turned the code every which way but loose. I even tried running the external file as an argument to the exec command. I'm going to have to concede defeat on this one.
Seems that you're calling the external script from "main" instead of from inside the function as I assumed.

That makes a big difference, since this call is not executed in the background. The main script has to wait for the external script to return before any signals can be handled.

"exec" cannot help because the whole script will then be overwritten by the external one, all signal handling will be lost.

I think you're right, we'll not get any further here without a bare metal redesign of the entire logic.
Yeah, that exec command thing was pretty desperate, huh? LOL
One idea though - if you run the external script in background and issue two (!) "wait" statements - what happens?

I have this:

#!/usr/bin/ksh
set +xv
trap usr1_handler USR1
trap exit_handler EXIT
usr1_handler() {
echo in USR1 signal handler
echo $ALARM
}
exit_handler() {
echo in EXIT signal handler
kill $ALARM
}
setAlarm() {
  set +xv
  echo in function
  ( ext_script $1 F; kill -USR1 $$ ) &
  echo $!
  return
}

MAINDELAY=30
TIMEOUT=10
setAlarm $TIMEOUT
ALARM=$!
echo in main
echo $ALARM
ext_script  $MAINDELAY M &
wait  ; wait


and get this:

in function
893148
in main
893148
F1 M1 F2 M2 F3 M3 F4 M4 F5 M5 F6 M6 F7 M7 F8 M8 F9 M9 F10 M10
in USR1 signal handler
893148
M11 M12 M13 M14 M15 M16 M17 M18 M19 M20 M21 M22 M23 M24 M25 M26 M27 M28 M29 M30
 in EXIT signal handler
kill: 893148: 0403-003 The specified process does not exist.
Forgot to mention - the external script now looks like this to make it more flexible in regard to the displayed prefix:

 for i in $(seq 1 $1); do echo "$2$i \c"; sleep 1; done;
Hmm. Interesting. I had tried a single wait statement without success. It appears that two waits might be working. Please enlighten me.
I think I'll need a bit of enlightenment myself.

"wait" without parameter should wait for all subprocesses to terminate, but this doesn't seem to be quite true.

Rather it seems that when the first one of these subprocesses terminates the EXIT handler is triggered, unless there's another "wait".

Instead of two times "wait" without parameter you can also use

wait $ALARM; wait $!

which works as well.

The drawback is that the first "wait" must be the one for the shorter running subprocess, which is of course not known in all cases.

Hmm.

I don't have any plausible explanation for this behaviour. It might be just "normal", but if so, why?
I guess you can't have it all. The "wait; wait" statement works, but I also need to be able to capture the exit status of the external command. Currently, I'm running the external command in the background like this:

( ext_command && cancelAlarm || ( cancelAlarm && false ) ) &

I've also tried...

( ext_command && cancelAlarm || ( cancelAlarm; STATUS=1 ) ) &

The STATUS variable is a global defined in  the parent process, but the background job doesn't seem to be modifying it. Perhaps there's too much nesting.
ASKER CERTIFIED SOLUTION
Avatar of woolmilkporc
woolmilkporc
Flag of Germany image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial