Solved

How to kick off a command parallel to a list of machines and monitor their result

Posted on 2008-10-23
38
345 Views
Last Modified: 2013-12-26
Hi,

I have posted a question few days ago for how to kick off a command (ssh $host "command...").

Someone suggest me to do the following:

#!/bin/sh
for host in `cat /list/of/hosts`
do
  ssh $host "some command" &


It is good. However, what do i need to do in the script to check whether the background process is complete before moving on the next step? Basically, I want each command is
complete (or the background process is complete) before moving to the next step.
0
Comment
Question by:xewoox
  • 17
  • 16
  • 5
38 Comments
 
LVL 68

Expert Comment

by:woolmilkporc
Comment Utility
So leave out the & !
0
 
LVL 68

Expert Comment

by:woolmilkporc
Comment Utility
Hi,
some explanation:
By terminating your command you create a background jo. Control is immediately returned to the shell so next next command can run.
By not terminating with a '&' the shell will wait until the command completes before starting the next one.
Greetings
wmp
0
 
LVL 68

Expert Comment

by:woolmilkporc
Comment Utility
ouh,
the first sentence should read: ... terminating your command with a '&' ...
Don't know what happened there ...

0
 

Author Comment

by:xewoox
Comment Utility
I can use $! then put all the process id into a file.  Then I can open up the file and then check whether the pid is complete or not one by one. I also need to somehow give a timeout.... I am very new in writing script and don't know much about it. I am looking some of you who may have done it before and provide me some working examples so I can modify it for my own purpose.

Thanks!
0
 
LVL 5

Expert Comment

by:zmo
Comment Utility
but if you want to get to the next command for each host and not wait for all the hosts to have completed (what would happened if you remove the &), you can do it two different ways depending on your needs.
#!/bin/sh

for host in `cat /list/of/hosts`

do

  ssh $host "some command" && <execute the next command here> &

done
 

or 
 

#!/bin/sh

for host in `cat /list/of/hosts`

do

  ssh $host "some command" &

  PID=$!

  <execute something else here>

  wait $PID && <execute the next command here> &

done

Open in new window

0
 
LVL 5

Expert Comment

by:zmo
Comment Utility
if you need any explanations, just ask ;)
0
 

Author Comment

by:xewoox
Comment Utility
Thank you for helping me....

So, the wait $PID will wait until the the process is done? If it does then we are not executing the command in parallel then. So, I am not quite understand the example.  

Now, if I want to give the command an x amount of time to run do I put
sleep before WAIT $PID?

0
 
LVL 68

Expert Comment

by:woolmilkporc
Comment Utility
You could put a 'sleep x ;' before a 'kill $PID'

0
 

Author Comment

by:xewoox
Comment Utility
Hi,

I need some explanation how the two examples above are running the command in parallel

Thanks!
0
 
LVL 68

Expert Comment

by:woolmilkporc
Comment Utility
They don't
0
 
LVL 5

Expert Comment

by:zmo
Comment Utility
hm... given the tests I ran I am wrong, sorry :/

1/ ( sleep 10 && echo a ) &

is the fix of the first suggestion

2/ is wrong, a fix I thought would be good is :

for host in `cat /list/of/hosts`
do
  ssh $host "some command" &
  PID=$!
  <execute something else here>
  ( wait $PID && <execute the next command here> ) &
done

but does not work, because wait has to be in the same shell as $PID's shell :-S

> So, the wait $PID will wait until the the process is done? If it does then we are not executing the command in parallel then. So, I am not quite understand the example.  

well yes, that's what it should have done.

So if I understand well what you want, it is to launch all commands at the same time, ie the for loop executes all ssh and ends, and when all are done, you execute the next command. Then wait my next comment, I'm writing that for you
for host in `cat /list/of/hosts`

do

    ( ssh $host "some command" && <execute the next command here> ) &

done

Open in new window

0
 

Author Comment

by:xewoox
Comment Utility
Mmmm...

Now I am confused.  I did a test with scp instead of ssh to copy a large file to 2 machines.

#!/bin/sh
for host in `cat /list/of/hosts`
do
  scp <a big file> $host:/tmp  &
  PID=$!
  wait $PID  && echo "PID=$PID is done"
done

As I a waiting I see both machines are having the coping over. Why they are not execute in parallel.

0
 

Author Comment

by:xewoox
Comment Utility
Thanks.. .please ignore my previous comment above...  Yes, the wait put them in sequential operation....

I am waiting for your new sript.   :=}

0
 

Author Comment

by:xewoox
Comment Utility
My previous comment about "I am confused" was response to "woolmilkporc"

My previous comment about "waiting for new script" was response to "zmo"

Thank you for all your helps.

0
 
LVL 5

Expert Comment

by:zmo
Comment Utility
ok, here is something simple that works, is that what you want ?
for host in `cat /list/of/hosts`

do

    ssh $host "some command" &

done
 

# this command waits until all the background jobs end

wait
 

# when all ssh are terminated, here you go

<execute next command>

Open in new window

0
 

Author Comment

by:xewoox
Comment Utility
Yes.

However, is there a way we can capture the return code of the command?

Say, I have a script (test)  to drive the command. This script will return a value say 0 or 1. Say, the command is scp a file to remote machine. Please see below. The script "test" will return ether 0 or 1. In your example, it there a way I can check the return code from command running on remote machine?
I hope so.

for host in `cat /list/of/hosts`
do
    test scp file $host:/tmp &
done
 
# this command waits until all the background jobs end
wait
 
# when all ssh are terminated, here you go
<execute next command>
0
 

Author Comment

by:xewoox
Comment Utility
Hi,

Your example doesn't seem to work.

I add an echo "Now we wait" before "wait" and add an echo "Wait is over" after the wait statement.... I found the script exit and seeing these two echo but the copying is still in progress.

0
 
LVL 5

Expert Comment

by:zmo
Comment Utility
well, the variable  $? contains the return value of the last command. I don't know if ssh returns $? for the command it executes, but I assume it does so.

Now it depends on what you want to do with the return value.
0
 
LVL 5

Expert Comment

by:zmo
Comment Utility
this is strange, I tried this snippet, and it worked...

for i in 5 6 7 8

do

    sleep $i &

done

 

echo "Now we wait"

# this command waits until all the background jobs end

wait
 

echo "Wait is over"

Open in new window

0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 

Author Comment

by:xewoox
Comment Utility
Yes, I can use $? but how to get the $? for each command  running on remote machines? When should I add code to get the $?
0
 
LVL 5

Expert Comment

by:zmo
Comment Utility
hm.... do you actually added "test" before the command ?

rename your script to test.sh or something else, test is an internal shell command.
0
 

Author Comment

by:xewoox
Comment Utility
Yes, sorry you example does work. :=}

Just my command is bad so the process finished instantly... so I thought it doesnt work. Sorry.

Anyway, any idea how to capture the rc from each command execution?
0
 

Author Comment

by:xewoox
Comment Utility
No, my script is not called "test"... i just made it up quickly for the append here.
0
 
LVL 5

Expert Comment

by:zmo
Comment Utility
well, what do you want to do with the return value ?
if one of the commands has failed, don't get to the next step for every host ?
if one of the commands has failed, don't get to the next step for that very host that has failed ?
0
 

Author Comment

by:xewoox
Comment Utility
No,

I want the command to run on each host. I want them to run in parallel. At the end I need to know the result from each host whether it is successfully done or failed.
0
 
LVL 5

Expert Comment

by:zmo
Comment Utility
ah ok, hm... let me put this up
0
 
LVL 5

Expert Comment

by:zmo
Comment Utility
here it is :
for host in `cat /list/of/hosts`

do

    ssh $host "some command" && $host $? >> commands.log &

done

 

# this command waits until all the background jobs end

wait

 

# when all ssh are terminated, here you go

<execute next command>

Open in new window

0
 
LVL 5

Expert Comment

by:zmo
Comment Utility
ooops I forgot a "detail" :p
for host in `cat /list/of/hosts`

do

    ssh $host "some command" && echo $host $? >> commands.log &

done

 

# this command waits until all the background jobs end

wait

 

# when all ssh are terminated, here you go

<execute next command>

Open in new window

0
 
LVL 5

Expert Comment

by:zmo
Comment Utility
btw, if you want a better log output you can change the echo to what is below
for host in `cat /list/of/hosts`

do

    ssh $host "some command" && ( echo $host ": success" >> commands.log || echo $host ": failed" ) >> commands.log &

done

 

# this command waits until all the background jobs end

wait

 

# when all ssh are terminated, here you go

<execute next command>

Open in new window

0
 

Author Comment

by:xewoox
Comment Utility
Thank you.

I will try this out later today. Got to go now.

Thank you again.
0
 

Author Comment

by:xewoox
Comment Utility
Hi,

I try it out.

It works.

Another question to you.

I want to write another script A to drive this script B (doing parallel execution using the example you gave me above).

This new script A will basically try to timeout the execution of script B.

In the new script A, I will invoke script B as

"$@" > /dev/null &

Now, script B will issue those scp in parallel

If I do a ps -ef | grep <script B>, I will see there are more than one occurrence of script B.

What is the command I can use to get the number of process that script B is being occupied?

I am thinking to check this and when the number of process return is 0 then I know they are done. If the timeout is reached and the # of process is not 0 then I will kill the remaining process.
0
 
LVL 5

Expert Comment

by:zmo
Comment Utility
ok, here is what you want :

launch_commands.sh:
echo $(date +"%b %d %H:%M:%S ") "starting script" >> commands.log
 

for i in 10 9 8 7 3 -1

do

    sh timeout.sh ssh $host "some command" 2&>1 > /dev/null || echo $(date +"%b %d %H:%M:%S ") $i ": failed" >> commands.log &

done
 

# this command waits until all the background jobs end

wait
 

# when all ssh are terminated, here you go

<following commands>

Open in new window

0
 
LVL 5

Expert Comment

by:zmo
Comment Utility
oops, in the previous post, of course the for line is wrong, it should be :

for host in `cat /list/of/hosts`

;)

and here is the "timeout.sh" script :
TIMEOUT=60 # sets a timeout of 60 seconds, change this to set up the timeout time

HOST=$2
 

( sleep $TIMEOUT && kill $$ && echo $(date +"%b %d %H:%M:%S ") $HOST ": timeout !" >> commands.log ) &
 

$@ && ( echo $(date +"%b %d %H:%M:%S ") $HOST ": success" >> commands.log || echo $HOST ": failed" ) >> commands.log

Open in new window

0
 
LVL 5

Expert Comment

by:zmo
Comment Utility
are you happy ? :)
0
 

Author Comment

by:xewoox
Comment Utility
I wrote a timeout script. Basically, it will timeout a command if the execution exceed the limit. However, if the execution is complete before the timeout then it is okay and the script won't wait.

In this script,  I have the following line

NP='ps uxc | grep -i -w "$CNAME" | awk '{print $2}' | wc -l
echo $NP

If I have an executable (say all it will do is to sleep 30 seconds). Then if I run this executable in the background and run another one. Then this script ($CNAME is this case is my executable name) will
echo 2.

However, if I have a shell script (say all it does is to sleep 30 seconds). Then if I run this script ($CNAME is this case is my shell script name) will return 0.

I don't know why?

Say, the executable is called testc and the shell script is called testsc. Then during their execution if I do a "ps -ef | grep testc" I will see

user1 .... ..... 0:00 testc

If I do a "ps -ef | grep testsc" I will see

user1... .... 0:00 /bin/sh testsc

Why that ps uxc command does not return the # of process for the shell script?
0
 
LVL 5

Accepted Solution

by:
zmo earned 500 total points
Comment Utility
hum... I'm sorry but I'm leaving work now for the week end, so I won't have much time to help you out with that... I'd advice you to write another question about that, so other experts can help you again...

hope that my help was useful to you, and cya next monday ;)
0
 

Author Comment

by:xewoox
Comment Utility
Thank you for you help. Have a good weekend.
0
 

Author Closing Comment

by:xewoox
Comment Utility
Thank you so much.  I am able to fine tune your example to what I need to do. Thank you once again for all your help.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Introduction: Finishing the grid – keyboard support for arrow keys to manoeuvre, entering the numbers.  The PreTranslateMessage function is to be used to intercept and respond to keyboard events. Continuing from the fourth article about sudoku. …
Have you tried to learn about Unicode, UTF-8, and multibyte text encoding and all the articles are just too "academic" or too technical? This article aims to make the whole topic easy for just about anyone to understand.
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
In a previous video, we went over how to export a DynamoDB table into Amazon S3.  In this video, we show how to load the export from S3 into a DynamoDB table.

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now