Explain bash command

Please explain in detail what is this line doing.

xargs -a $tmpdir/missing -P 20 -L 1 -I '{}' /bin/bash -c 'do_distcp "$@"' _ {}

Open in new window

LVL 31
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

-a $tmpdir/missing

xargs reads its arguments from the specified file rather than stdin.  

The most common usage of xargs is in a pipe, eg:

find . -name "*.txt" | xargs .....

-P 20

Run up to a maximum of 20 processes

-L 1

Only use 1 non-blank line per command line.    I think the reason this has been specified is to preserve the order the the arguments in the 'missing' file are processed as you are using up to 20 processes to handle the load and if they are reading in multiple lines, then you can't guarantee the order.

-I '{}'

Same as the deprecated -i option.  Also, doesn't need the quotes, can be simply -I{}
This specifies the replacement string for the arguments feed from stdin (or in this case, from the 'missing' file)

You can use any unique string, {} just happens to be the default.

/bin/bash -c

Feed the arguments to bash and run the command (specified by the -c flag)

'do_distcp "$@"' _ {}

Run the do_distcp command with all the passed arguments as a single argument.

This construct is a little unusual, so hopefully this example will help.

Say the file missing contained


If the command was instead:

'do_distcp {}'

xargs would construct the command as

do_distcp AA BB CC

ie: passing 3 separate arguments

By using:

'do_distcp "$@"' _ {}

It populates $@ with the contents of {}, so the command constructed is

do_distcp 'AA BB CC'

ie: a single argument is passed

Does that help?

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
farzanjAuthor Commented:
Thanks for your help.

What is the meaning of underscore in 'do_distcp "$@"' _ {}  ?

What is generally the meaning of - in commands like
tar xf -

Any other examples of cases like these that you can think of ?
I'm not 100% sure of the meaning of the underscore as it's a syntax I hadn't seen before, but I think it is specific to xargs and a way of passing the joining the arguments.

The - on the other hand is used to specify stdin as the filename.
Cloud Class® Course: Microsoft Office 2010

This course will introduce you to the interfaces and features of Microsoft Office 2010 Word, Excel, PowerPoint, Outlook, and Access. You will learn about the features that are shared between all products in the Office suite, as well as the new features that are product specific.

since the _ will be passed to do_distcp, you should ask the author of do_distcp what it does with _ or consult the documentation for do_distcp, or show us the code of do_distcp
farzanjAuthor Commented:
Increasing points :)

Ok, here's the script.  It is copying files from one Hadoop cluster to another using one of Hadoop's library functions.



ARGS="$ARGS -Dmapred.job.queue.name=bdslogging"
ARGS="$ARGS -Ddfs.nameservices=nameservice1,nameservice2"
ARGS="$ARGS -Ddfs.ha.namenodes.nameservice2=nn1,nn2"
ARGS="$ARGS -Ddfs.namenode.rpc-address.nameservice2.nn1=r3m1.hadoop.log5.blackberry:8020"
ARGS="$ARGS -Ddfs.namenode.rpc-address.nameservice2.nn2=r7m1.hadoop.log5.blackberry:8020"
ARGS="$ARGS -Ddfs.client.failover.proxy.provider.nameservice2=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"

do_distcp() {
  echo $dir/_SUCCESS does not exist remotely. Copying.
  hdfs dfs $ARGS -rm -R "hdfs://nameservice2/$dir/*"
  hdfs dfs $ARGS -mkdir "hdfs://nameservice2/$dir/_tmp" &&
    mapred distcp $ARGS -m 10 -overwrite hdfs://nameservice1/$dir hdfs://nameservice2/$dir/_tmp &&
    hdfs dfs $ARGS -mv "hdfs://nameservice2/$dir/_tmp/part*" "hdfs://nameservice2/$dir" &&
    hdfs dfs $ARGS -mv "hdfs://nameservice2/$dir/_tmp/_SUCCESS" "hdfs://nameservice2/$dir" &&
    hdfs dfs $ARGS -rm -R "hdfs://nameservice2/$dir/_tmp"

  flock --exclusive --nonblock 200 || exit 1
  export ARGS
  export -f do_distcp

  tmpdir=`mktemp -d`

  touch $tmpdir/missing

  hdfs dfs $ARGS -ls "hdfs://nameservice1/$LOGPATH" | grep _SUCCESS | grep -v 'tmp' | perl -pe 's{.*hdfs://nameservice1(.*)/_SUCCESS}{$1}' > $tmpdir/local
  hdfs dfs $ARGS -ls "hdfs://nameservice2/$LOGPATH" | grep _SUCCESS | grep -v 'tmp' | perl -pe 's{.*hdfs://nameservice2(.*)/_SUCCESS}{$1}' > $tmpdir/remote

  for file in `cat $tmpdir/local`
    grep "$file" $tmpdir/remote >/dev/null || echo $file >> $tmpdir/missing

  xargs -a $tmpdir/missing -P 20 -L 1 -I '{}' /bin/bash -c 'do_distcp "$@"' _ {}

  rm -rf $tmpdir
) 200> /tmp/bbds.distcp.lock

Open in new window

Since do_distcp does not appear to use $0, it looks like the _ is merely a placeholder so that the {} arg will go into $1
"- on the other hand is used to specify stdin as the filename."
tar  xf is using it that way, and several other programs use that convention,
but in other contexts, it can be used to specify stdout, or whatever the program that is using it decides that it means.
In general, see the man page for the program to which it is being passed to see what it means in the context it is being used.
farzanjAuthor Commented:
Thank you.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Shell Scripting

From novice to tech pro — start learning today.