Link to home
Start Free TrialLog in
Avatar of farzanj
farzanjFlag for Canada

asked on

Explain bash command

Please explain in detail what is this line doing.

xargs -a $tmpdir/missing -P 20 -L 1 -I '{}' /bin/bash -c 'do_distcp "$@"' _ {}

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of Tintin
Tintin

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of farzanj

ASKER

Thanks for your help.

What is the meaning of underscore in 'do_distcp "$@"' _ {}  ?

What is generally the meaning of - in commands like
tar xf -

Any other examples of cases like these that you can think of ?
Avatar of Tintin
Tintin

I'm not 100% sure of the meaning of the underscore as it's a syntax I hadn't seen before, but I think it is specific to xargs and a way of passing the joining the arguments.

The - on the other hand is used to specify stdin as the filename.
since the _ will be passed to do_distcp, you should ask the author of do_distcp what it does with _ or consult the documentation for do_distcp, or show us the code of do_distcp
Avatar of farzanj

ASKER

Increasing points :)

Ok, here's the script.  It is copying files from one Hadoop cluster to another using one of Hadoop's library functions.

#!/bin/bash

LOGPATH='/user/relay_rpt/BBDS-DB/*/logdb/*/p_dc=6/p_date=*/p_hour=*/_SUCCESS'

ARGS=""
ARGS="$ARGS -Dmapred.job.queue.name=bdslogging"
ARGS="$ARGS -Ddfs.nameservices=nameservice1,nameservice2"
ARGS="$ARGS -Ddfs.ha.namenodes.nameservice2=nn1,nn2"
ARGS="$ARGS -Ddfs.namenode.rpc-address.nameservice2.nn1=r3m1.hadoop.log5.blackberry:8020"
ARGS="$ARGS -Ddfs.namenode.rpc-address.nameservice2.nn2=r7m1.hadoop.log5.blackberry:8020"
ARGS="$ARGS -Ddfs.client.failover.proxy.provider.nameservice2=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"

do_distcp() {
  dir=$1
  echo $dir/_SUCCESS does not exist remotely. Copying.
  hdfs dfs $ARGS -rm -R "hdfs://nameservice2/$dir/*"
  hdfs dfs $ARGS -mkdir "hdfs://nameservice2/$dir/_tmp" &&
    mapred distcp $ARGS -m 10 -overwrite hdfs://nameservice1/$dir hdfs://nameservice2/$dir/_tmp &&
    hdfs dfs $ARGS -mv "hdfs://nameservice2/$dir/_tmp/part*" "hdfs://nameservice2/$dir" &&
    hdfs dfs $ARGS -mv "hdfs://nameservice2/$dir/_tmp/_SUCCESS" "hdfs://nameservice2/$dir" &&
    hdfs dfs $ARGS -rm -R "hdfs://nameservice2/$dir/_tmp"
}

(
  flock --exclusive --nonblock 200 || exit 1
  export ARGS
  export -f do_distcp

  tmpdir=`mktemp -d`

  touch $tmpdir/missing

  hdfs dfs $ARGS -ls "hdfs://nameservice1/$LOGPATH" | grep _SUCCESS | grep -v 'tmp' | perl -pe 's{.*hdfs://nameservice1(.*)/_SUCCESS}{$1}' > $tmpdir/local
  hdfs dfs $ARGS -ls "hdfs://nameservice2/$LOGPATH" | grep _SUCCESS | grep -v 'tmp' | perl -pe 's{.*hdfs://nameservice2(.*)/_SUCCESS}{$1}' > $tmpdir/remote

  for file in `cat $tmpdir/local`
  do
    grep "$file" $tmpdir/remote >/dev/null || echo $file >> $tmpdir/missing
  done

  xargs -a $tmpdir/missing -P 20 -L 1 -I '{}' /bin/bash -c 'do_distcp "$@"' _ {}

  rm -rf $tmpdir
) 200> /tmp/bbds.distcp.lock

Open in new window

SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of farzanj

ASKER

Thank you.