?
Solved

Bash Script Output

Posted on 2012-03-20
4
Medium Priority
?
609 Views
Last Modified: 2012-03-21
Hi

I currently have a script which ideally, I'd like to run 1000 times by submitting them to a queue on a linux cluster using qsub.

The script is called pipeline.sh.

I've tried submitting using qsub 10 times from the command line at the same time. All ten scripts fell over, so I suspect each pipeline instance is attempting to access (to write to) the same file somewhere.

I need to try to control where the output goes for each run (for TMP files, and any output files). If the location of where the output goes can't be changed (e.g. if it just gets dumped to the current working directory), would it be possible to create a set of temporary directories? (as part of a bash script).

Here's what I'm thinking:

e.g.  pseudo-code for a control script that submits the jobs to the queue.
runDirs=qw(dir1, dir2, dir3, dir4)
foreach(dir in $runDirs)
do
       cd $dir
       qsub -cwd -b y -V -q node.q -N name bash script.sh
       cd ..
done

I basically want to submit multiple jobs to a queue without the runs tripping up over one another.

I've attached the scripts which I'm currently working with.

Thanks,

Stephen.
get-candidates.txt
pipeline.sh
statistics.txt
0
Comment
Question by:StephenMcGowan
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
4 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 1500 total points
ID: 37741425
You might try changing
TMP=~/permanalysis/tmp
to
TMP=~/permanalysis/tmp.$$
mkdir $TMP
0
 

Author Comment

by:StephenMcGowan
ID: 37741790
Hi ozo,

Thanks for getting back to me.

TMP=~/permanalysis/tmp.$$
mkdir $TMP

This created a TEMP folder for each job in my permanalysis folder i.e. "tmp.17906" etc.

I ran 10 jobs from the commandline at the same time to test your modified code.
Each job failed for different reasons.

cannot find FILE:

formatdb host and controls ...
Run fastacutter.pl ..

cannot find FILE at /fs/nas15/home/mqbpgsm4/permanalysis/bin/fastacutter.pl line 9.
chmod: cannot access `/fs/nas15/home/mqbpgsm4/permanalysis/tmp.8975/input_for_pipeline.sh': No such file or directory
get molecular mimicry candidates ...
pipeline.sh: line 215: /fs/nas15/home/mqbpgsm4/permanalysis/tmp.8975/input_for_pipeline.sh: No such file or directory
cat: /fs/nas15/home/mqbpgsm4/permanalysis/tmp.8975/B_burgdorferi*-in: No such file or directory
cat: /fs/nas15/home/mqbpgsm4/permanalysis/tmp.8975/B_burgdorferi*-out: No such file or directory
grep: /fs/nas15/home/mqbpgsm4/permanalysis/tmp.8975/B_burgdorferi*-peptides: No such file or directory
grep: /fs/nas15/home/mqbpgsm4/permanalysis/tmp.8975/B_burgdorferi*-peptidesincontrol: No such file or directory
clean up tmp ...
Job B_burgdorferi completed...
rm: cannot remove `/fs/nas15/home/mqbpgsm4/permanalysis/data/proteomes/B_burgdorferi.fasta': No such file or directory


is this because the same script (found in /fs/nas15/home/mqbpgsm4/permanalysis/bin/fastacutter.pl) is currently being run on a different job? Would a temporary folder need to be set up for the /bin/ directory where the scripts are being held or am I understanding this wrong?

No string found in fs/nas15/home/mqbpgsm4/permanalysis/tmp.21749/B_burgdorferi1-finalids :

formatdb host and controls ...
Run fastacutter.pl ..
get molecular mimicry candidates ...
Working on B_burgdorferi1
Blast against control species
Separate fulllength conserved/nonconserved proteins
Ungapped Blast parasite 14mers against control species
Filter peptides control species
Ungapped Blast against host/vector proteome
Filter peptides host/vector

No string to search with was found in /fs/nas15/home/mqbpgsm4/permanalysis/tmp.21749/B_burgdorferi1-finalids. Nothing to do!
Calculation of Shannon entropy
clean up tmp ...
Job B_burgdorferi completed...


Thanks again,

Stephen.
0
 

Author Comment

by:StephenMcGowan
ID: 37745402
I think for:

cannot find FILE at /fs/nas15/home/mqbpgsm4/permanalysis/bin/fastacutter.pl line 9.

It is because PROTEOMEDIR is currently still assigned as a global folder used by all jobs, but it will need to be job specific:

if [ -z "$PROTEOMEDIR" ]
then
    PROTEOMEDIR=~/permanalysis/data/proteomes

Shuffleseq needs to pick up Original.fasta from the PROTEOMEDIR shown above. BUT it needs to output (-outseq) the file into the specific job folder (this can be the TMP folder created for the job as specified above).
Phobius needs to pick up $PARASITE.fasta from the jobs tmp folder (where it was submitted by Shuffleseq) and it's output should go back in to the jobs specific tmp folder.


How the code looks at the moment:

#Shufflseq
echo Use shuffleseq to create shuffled proteome
$SHUFFLESEQ -sequence $PROTEOMEDIR/Original.fasta -outseq $PROTEOMEDIR/B_burgdorferi.fasta
#phobius
echo Run phobius ...
$PHOBIUSBIN -short $PROTEOMEDIR/$PARASITE.fasta > $TMP/$PARASITE.phobius
$BINDIR/adaptphobius.pl $PARASITE

How I think it should look (apologies for any mistakes):

#Shufflseq
echo Use shuffleseq to create shuffled proteome
$SHUFFLESEQ -sequence $PROTEOMEDIR/Original.fasta -outseq $TMP/B_burgdorferi.fasta
#phobius
echo Run phobius ...
$PHOBIUSBIN -short $TMP/$PARASITE.fasta > $TMP/$PARASITE.phobius
$BINDIR/adaptphobius.pl $PARASITE

Would this work? I'm trying to keep everythinbg 'job specific' so that if the same script was run multiple times, there wouldn't be any problem. As it is at the moment I can see many B_Burgdorferi files being created ($SHUFFLESEQ -sequence $PROTEOMEDIR/Original.fasta -outseq $PROTEOMEDIR/B_burgdorferi.fasta) in PROTEOMEDIR=~/permanalysis/data/proteomes

Thanks again,

Stephen
0
 

Author Closing Comment

by:StephenMcGowan
ID: 37746568
Solved the immediate problem I had, but this in turn led to another problem.

Posted second question on same conversation with no response.

Therefore opened a new question:

http://www.experts-exchange.com/Programming/Languages/Scripting/Perl/Q_27641894.html

Thanks,

Stephen.
0

Featured Post

Will your db performance match your db growth?

In Percona’s white paper “Performance at Scale: Keeping Your Database on Its Toes,” we take a high-level approach to what you need to think about when planning for database scalability.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Recently, an awarded photographer, Selina De Maeyer (http://www.selinademaeyer.com/), completed a photo shoot of a beautiful event (http://www.sintjacobantwerpen.be/verslag-en-fotoreportage-van-de-sacramentsprocessie-door-antwerpen#thumbnails) in An…
Active Directory replication delay is the cause to many problems.  Here is a super easy script to force Active Directory replication to all sites with by using an elevated PowerShell command prompt, and a tool to verify your changes.
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
How to Install VMware Tools in Red Hat Enterprise Linux 6.4 (RHEL 6.4) Step-by-Step Tutorial
Suggested Courses

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question