Solved

How to kick off a command parallel to a list of machines and monitor their result

Posted on 2008-10-23
38
346 Views
Last Modified: 2013-12-26
Hi,

I have posted a question few days ago for how to kick off a command (ssh $host "command...").

Someone suggest me to do the following:

#!/bin/sh
for host in `cat /list/of/hosts`
do
  ssh $host "some command" &


It is good. However, what do i need to do in the script to check whether the background process is complete before moving on the next step? Basically, I want each command is
complete (or the background process is complete) before moving to the next step.
0
Comment
Question by:xewoox
  • 17
  • 16
  • 5
38 Comments
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 22789229
So leave out the & !
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 22789266
Hi,
some explanation:
By terminating your command you create a background jo. Control is immediately returned to the shell so next next command can run.
By not terminating with a '&' the shell will wait until the command completes before starting the next one.
Greetings
wmp
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 22789284
ouh,
the first sentence should read: ... terminating your command with a '&' ...
Don't know what happened there ...

0
 

Author Comment

by:xewoox
ID: 22789306
I can use $! then put all the process id into a file.  Then I can open up the file and then check whether the pid is complete or not one by one. I also need to somehow give a timeout.... I am very new in writing script and don't know much about it. I am looking some of you who may have done it before and provide me some working examples so I can modify it for my own purpose.

Thanks!
0
 
LVL 5

Expert Comment

by:zmo
ID: 22789388
but if you want to get to the next command for each host and not wait for all the hosts to have completed (what would happened if you remove the &), you can do it two different ways depending on your needs.
#!/bin/sh

for host in `cat /list/of/hosts`

do

  ssh $host "some command" && <execute the next command here> &

done
 

or 
 

#!/bin/sh

for host in `cat /list/of/hosts`

do

  ssh $host "some command" &

  PID=$!

  <execute something else here>

  wait $PID && <execute the next command here> &

done

Open in new window

0
 
LVL 5

Expert Comment

by:zmo
ID: 22789422
if you need any explanations, just ask ;)
0
 

Author Comment

by:xewoox
ID: 22789545
Thank you for helping me....

So, the wait $PID will wait until the the process is done? If it does then we are not executing the command in parallel then. So, I am not quite understand the example.  

Now, if I want to give the command an x amount of time to run do I put
sleep before WAIT $PID?

0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 22789659
You could put a 'sleep x ;' before a 'kill $PID'

0
 

Author Comment

by:xewoox
ID: 22789726
Hi,

I need some explanation how the two examples above are running the command in parallel

Thanks!
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 22789759
They don't
0
 
LVL 5

Expert Comment

by:zmo
ID: 22789805
hm... given the tests I ran I am wrong, sorry :/

1/ ( sleep 10 && echo a ) &

is the fix of the first suggestion

2/ is wrong, a fix I thought would be good is :

for host in `cat /list/of/hosts`
do
  ssh $host "some command" &
  PID=$!
  <execute something else here>
  ( wait $PID && <execute the next command here> ) &
done

but does not work, because wait has to be in the same shell as $PID's shell :-S

> So, the wait $PID will wait until the the process is done? If it does then we are not executing the command in parallel then. So, I am not quite understand the example.  

well yes, that's what it should have done.

So if I understand well what you want, it is to launch all commands at the same time, ie the for loop executes all ssh and ends, and when all are done, you execute the next command. Then wait my next comment, I'm writing that for you
for host in `cat /list/of/hosts`

do

    ( ssh $host "some command" && <execute the next command here> ) &

done

Open in new window

0
 

Author Comment

by:xewoox
ID: 22789808
Mmmm...

Now I am confused.  I did a test with scp instead of ssh to copy a large file to 2 machines.

#!/bin/sh
for host in `cat /list/of/hosts`
do
  scp <a big file> $host:/tmp  &
  PID=$!
  wait $PID  && echo "PID=$PID is done"
done

As I a waiting I see both machines are having the coping over. Why they are not execute in parallel.

0
 

Author Comment

by:xewoox
ID: 22789844
Thanks.. .please ignore my previous comment above...  Yes, the wait put them in sequential operation....

I am waiting for your new sript.   :=}

0
 

Author Comment

by:xewoox
ID: 22789869
My previous comment about "I am confused" was response to "woolmilkporc"

My previous comment about "waiting for new script" was response to "zmo"

Thank you for all your helps.

0
 
LVL 5

Expert Comment

by:zmo
ID: 22789899
ok, here is something simple that works, is that what you want ?
for host in `cat /list/of/hosts`

do

    ssh $host "some command" &

done
 

# this command waits until all the background jobs end

wait
 

# when all ssh are terminated, here you go

<execute next command>

Open in new window

0
 

Author Comment

by:xewoox
ID: 22789962
Yes.

However, is there a way we can capture the return code of the command?

Say, I have a script (test)  to drive the command. This script will return a value say 0 or 1. Say, the command is scp a file to remote machine. Please see below. The script "test" will return ether 0 or 1. In your example, it there a way I can check the return code from command running on remote machine?
I hope so.

for host in `cat /list/of/hosts`
do
    test scp file $host:/tmp &
done
 
# this command waits until all the background jobs end
wait
 
# when all ssh are terminated, here you go
<execute next command>
0
 

Author Comment

by:xewoox
ID: 22790044
Hi,

Your example doesn't seem to work.

I add an echo "Now we wait" before "wait" and add an echo "Wait is over" after the wait statement.... I found the script exit and seeing these two echo but the copying is still in progress.

0
 
LVL 5

Expert Comment

by:zmo
ID: 22790097
well, the variable  $? contains the return value of the last command. I don't know if ssh returns $? for the command it executes, but I assume it does so.

Now it depends on what you want to do with the return value.
0
 
LVL 5

Expert Comment

by:zmo
ID: 22790121
this is strange, I tried this snippet, and it worked...

for i in 5 6 7 8

do

    sleep $i &

done

 

echo "Now we wait"

# this command waits until all the background jobs end

wait
 

echo "Wait is over"

Open in new window

0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 

Author Comment

by:xewoox
ID: 22790124
Yes, I can use $? but how to get the $? for each command  running on remote machines? When should I add code to get the $?
0
 
LVL 5

Expert Comment

by:zmo
ID: 22790137
hm.... do you actually added "test" before the command ?

rename your script to test.sh or something else, test is an internal shell command.
0
 

Author Comment

by:xewoox
ID: 22790144
Yes, sorry you example does work. :=}

Just my command is bad so the process finished instantly... so I thought it doesnt work. Sorry.

Anyway, any idea how to capture the rc from each command execution?
0
 

Author Comment

by:xewoox
ID: 22790154
No, my script is not called "test"... i just made it up quickly for the append here.
0
 
LVL 5

Expert Comment

by:zmo
ID: 22790156
well, what do you want to do with the return value ?
if one of the commands has failed, don't get to the next step for every host ?
if one of the commands has failed, don't get to the next step for that very host that has failed ?
0
 

Author Comment

by:xewoox
ID: 22790171
No,

I want the command to run on each host. I want them to run in parallel. At the end I need to know the result from each host whether it is successfully done or failed.
0
 
LVL 5

Expert Comment

by:zmo
ID: 22790195
ah ok, hm... let me put this up
0
 
LVL 5

Expert Comment

by:zmo
ID: 22790223
here it is :
for host in `cat /list/of/hosts`

do

    ssh $host "some command" && $host $? >> commands.log &

done

 

# this command waits until all the background jobs end

wait

 

# when all ssh are terminated, here you go

<execute next command>

Open in new window

0
 
LVL 5

Expert Comment

by:zmo
ID: 22790233
ooops I forgot a "detail" :p
for host in `cat /list/of/hosts`

do

    ssh $host "some command" && echo $host $? >> commands.log &

done

 

# this command waits until all the background jobs end

wait

 

# when all ssh are terminated, here you go

<execute next command>

Open in new window

0
 
LVL 5

Expert Comment

by:zmo
ID: 22790791
btw, if you want a better log output you can change the echo to what is below
for host in `cat /list/of/hosts`

do

    ssh $host "some command" && ( echo $host ": success" >> commands.log || echo $host ": failed" ) >> commands.log &

done

 

# this command waits until all the background jobs end

wait

 

# when all ssh are terminated, here you go

<execute next command>

Open in new window

0
 

Author Comment

by:xewoox
ID: 22791194
Thank you.

I will try this out later today. Got to go now.

Thank you again.
0
 

Author Comment

by:xewoox
ID: 22796931
Hi,

I try it out.

It works.

Another question to you.

I want to write another script A to drive this script B (doing parallel execution using the example you gave me above).

This new script A will basically try to timeout the execution of script B.

In the new script A, I will invoke script B as

"$@" > /dev/null &

Now, script B will issue those scp in parallel

If I do a ps -ef | grep <script B>, I will see there are more than one occurrence of script B.

What is the command I can use to get the number of process that script B is being occupied?

I am thinking to check this and when the number of process return is 0 then I know they are done. If the timeout is reached and the # of process is not 0 then I will kill the remaining process.
0
 
LVL 5

Expert Comment

by:zmo
ID: 22798144
ok, here is what you want :

launch_commands.sh:
echo $(date +"%b %d %H:%M:%S ") "starting script" >> commands.log
 

for i in 10 9 8 7 3 -1

do

    sh timeout.sh ssh $host "some command" 2&>1 > /dev/null || echo $(date +"%b %d %H:%M:%S ") $i ": failed" >> commands.log &

done
 

# this command waits until all the background jobs end

wait
 

# when all ssh are terminated, here you go

<following commands>

Open in new window

0
 
LVL 5

Expert Comment

by:zmo
ID: 22798160
oops, in the previous post, of course the for line is wrong, it should be :

for host in `cat /list/of/hosts`

;)

and here is the "timeout.sh" script :
TIMEOUT=60 # sets a timeout of 60 seconds, change this to set up the timeout time

HOST=$2
 

( sleep $TIMEOUT && kill $$ && echo $(date +"%b %d %H:%M:%S ") $HOST ": timeout !" >> commands.log ) &
 

$@ && ( echo $(date +"%b %d %H:%M:%S ") $HOST ": success" >> commands.log || echo $HOST ": failed" ) >> commands.log

Open in new window

0
 
LVL 5

Expert Comment

by:zmo
ID: 22798166
are you happy ? :)
0
 

Author Comment

by:xewoox
ID: 22799181
I wrote a timeout script. Basically, it will timeout a command if the execution exceed the limit. However, if the execution is complete before the timeout then it is okay and the script won't wait.

In this script,  I have the following line

NP='ps uxc | grep -i -w "$CNAME" | awk '{print $2}' | wc -l
echo $NP

If I have an executable (say all it will do is to sleep 30 seconds). Then if I run this executable in the background and run another one. Then this script ($CNAME is this case is my executable name) will
echo 2.

However, if I have a shell script (say all it does is to sleep 30 seconds). Then if I run this script ($CNAME is this case is my shell script name) will return 0.

I don't know why?

Say, the executable is called testc and the shell script is called testsc. Then during their execution if I do a "ps -ef | grep testc" I will see

user1 .... ..... 0:00 testc

If I do a "ps -ef | grep testsc" I will see

user1... .... 0:00 /bin/sh testsc

Why that ps uxc command does not return the # of process for the shell script?
0
 
LVL 5

Accepted Solution

by:
zmo earned 500 total points
ID: 22799560
hum... I'm sorry but I'm leaving work now for the week end, so I won't have much time to help you out with that... I'd advice you to write another question about that, so other experts can help you again...

hope that my help was useful to you, and cya next monday ;)
0
 

Author Comment

by:xewoox
ID: 22799615
Thank you for you help. Have a good weekend.
0
 

Author Closing Comment

by:xewoox
ID: 31509347
Thank you so much.  I am able to fine tune your example to what I need to do. Thank you once again for all your help.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
UNiX Script filesystem space usage 19 68
zeroMAx challenge 20 87
FizzBuzz challenge 9 76
Wrap Oraccle SQL*Plus executable Command 4 68
Introduction: Hints for the grid button.  Nested classes, templated collections.  Squash that darned bug! Continuing from the sixth article about sudoku.   Open the project in visual studio. First we will finish with the SUD_SETVALUE messa…
If you use Adobe Reader X it is possible you can't open OLE PDF documents in the standard. The reason is the 'save box mode' in adobe reader X. Many people think the protected Mode of adobe reader x is only to stop the write access. But this fe…
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.

910 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now