Explain bash command

Posted on 2013-09-19
Medium Priority
Last Modified: 2013-10-07
Please explain in detail what is this line doing.

xargs -a $tmpdir/missing -P 20 -L 1 -I '{}' /bin/bash -c 'do_distcp "$@"' _ {}

Open in new window

Question by:farzanj
  • 3
  • 3
  • 2
LVL 48

Accepted Solution

Tintin earned 1200 total points
ID: 39507840
-a $tmpdir/missing

xargs reads its arguments from the specified file rather than stdin.  

The most common usage of xargs is in a pipe, eg:

find . -name "*.txt" | xargs .....

-P 20

Run up to a maximum of 20 processes

-L 1

Only use 1 non-blank line per command line.    I think the reason this has been specified is to preserve the order the the arguments in the 'missing' file are processed as you are using up to 20 processes to handle the load and if they are reading in multiple lines, then you can't guarantee the order.

-I '{}'

Same as the deprecated -i option.  Also, doesn't need the quotes, can be simply -I{}
This specifies the replacement string for the arguments feed from stdin (or in this case, from the 'missing' file)

You can use any unique string, {} just happens to be the default.

/bin/bash -c

Feed the arguments to bash and run the command (specified by the -c flag)

'do_distcp "$@"' _ {}

Run the do_distcp command with all the passed arguments as a single argument.

This construct is a little unusual, so hopefully this example will help.

Say the file missing contained


If the command was instead:

'do_distcp {}'

xargs would construct the command as

do_distcp AA BB CC

ie: passing 3 separate arguments

By using:

'do_distcp "$@"' _ {}

It populates $@ with the contents of {}, so the command constructed is

do_distcp 'AA BB CC'

ie: a single argument is passed

Does that help?
LVL 31

Author Comment

ID: 39540492
Thanks for your help.

What is the meaning of underscore in 'do_distcp "$@"' _ {}  ?

What is generally the meaning of - in commands like
tar xf -

Any other examples of cases like these that you can think of ?
LVL 48

Expert Comment

ID: 39542302
I'm not 100% sure of the meaning of the underscore as it's a syntax I hadn't seen before, but I think it is specific to xargs and a way of passing the joining the arguments.

The - on the other hand is used to specify stdin as the filename.
Get expert help—faster!

Need expert help—fast? Use the Help Bell for personalized assistance getting answers to your important questions.

LVL 85

Expert Comment

ID: 39542406
since the _ will be passed to do_distcp, you should ask the author of do_distcp what it does with _ or consult the documentation for do_distcp, or show us the code of do_distcp
LVL 31

Author Comment

ID: 39542765
Increasing points :)

Ok, here's the script.  It is copying files from one Hadoop cluster to another using one of Hadoop's library functions.



ARGS="$ARGS -Dmapred.job.queue.name=bdslogging"
ARGS="$ARGS -Ddfs.nameservices=nameservice1,nameservice2"
ARGS="$ARGS -Ddfs.ha.namenodes.nameservice2=nn1,nn2"
ARGS="$ARGS -Ddfs.namenode.rpc-address.nameservice2.nn1=r3m1.hadoop.log5.blackberry:8020"
ARGS="$ARGS -Ddfs.namenode.rpc-address.nameservice2.nn2=r7m1.hadoop.log5.blackberry:8020"
ARGS="$ARGS -Ddfs.client.failover.proxy.provider.nameservice2=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"

do_distcp() {
  echo $dir/_SUCCESS does not exist remotely. Copying.
  hdfs dfs $ARGS -rm -R "hdfs://nameservice2/$dir/*"
  hdfs dfs $ARGS -mkdir "hdfs://nameservice2/$dir/_tmp" &&
    mapred distcp $ARGS -m 10 -overwrite hdfs://nameservice1/$dir hdfs://nameservice2/$dir/_tmp &&
    hdfs dfs $ARGS -mv "hdfs://nameservice2/$dir/_tmp/part*" "hdfs://nameservice2/$dir" &&
    hdfs dfs $ARGS -mv "hdfs://nameservice2/$dir/_tmp/_SUCCESS" "hdfs://nameservice2/$dir" &&
    hdfs dfs $ARGS -rm -R "hdfs://nameservice2/$dir/_tmp"

  flock --exclusive --nonblock 200 || exit 1
  export ARGS
  export -f do_distcp

  tmpdir=`mktemp -d`

  touch $tmpdir/missing

  hdfs dfs $ARGS -ls "hdfs://nameservice1/$LOGPATH" | grep _SUCCESS | grep -v 'tmp' | perl -pe 's{.*hdfs://nameservice1(.*)/_SUCCESS}{$1}' > $tmpdir/local
  hdfs dfs $ARGS -ls "hdfs://nameservice2/$LOGPATH" | grep _SUCCESS | grep -v 'tmp' | perl -pe 's{.*hdfs://nameservice2(.*)/_SUCCESS}{$1}' > $tmpdir/remote

  for file in `cat $tmpdir/local`
    grep "$file" $tmpdir/remote >/dev/null || echo $file >> $tmpdir/missing

  xargs -a $tmpdir/missing -P 20 -L 1 -I '{}' /bin/bash -c 'do_distcp "$@"' _ {}

  rm -rf $tmpdir
) 200> /tmp/bbds.distcp.lock

Open in new window

LVL 85

Assisted Solution

ozo earned 800 total points
ID: 39542797
Since do_distcp does not appear to use $0, it looks like the _ is merely a placeholder so that the {} arg will go into $1
LVL 85

Assisted Solution

ozo earned 800 total points
ID: 39542842
"- on the other hand is used to specify stdin as the filename."
tar  xf is using it that way, and several other programs use that convention,
but in other contexts, it can be used to specify stdout, or whatever the program that is using it decides that it means.
In general, see the man page for the program to which it is being passed to see what it means in the context it is being used.
LVL 31

Author Closing Comment

ID: 39552118
Thank you.

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Background Still having to process all these year-end "csv" files received from all these sources (including Government entities), sometimes we have the need to examine the contents due to data error, etc... As a "Unix" shop, our only readily …
Utilizing an array to gracefully append to a list of EmailAddresses
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…

600 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question