[Webinar] Learn how to a build a cloud-first strategyRegister Now


looping in a bash script

Posted on 2014-08-02
Medium Priority
Last Modified: 2014-08-17
I have been doing this for hours and I have just given up:...

The task: loop through a folder of files and perform 6 scripts on each file to run as one shell script

Each time a file is processed by one script it gets put in another folder. The intention is to pass the resulting file name to the next script. I dont want to keep the folder path and the original .txt or whatever just the filename and the new file extension. Can anyone help?

The scripts are as follows (on a mac as a shell script)

for each file as $file

$ gzcat $file.fq.gz | ./reformat_sequence_data.rb > ./reformatsequence/$file.txt

$ python qualityMask.py ./reformatsequence/$file.txt  ./qualitymask/$file.seq 20 1

$ ./unique_seq_counts.rb ./qualitymask/$file.seq > ./uniquecounts/$file.counts.txt

$ cut -f1 ./uniquecounts/$file.counts.txt > ./uniqueseq/$file.uniq.seq

$ ./bowtie-1.1.0/bowtie -r -m1 -v1  ./bowtie-1.1.0/indexes/hg19 ./uniqueseq/$file.uniq.seq > ./alignedunique/$file.bowtie.txt
Question by:sebastizz
LVL 40

Accepted Solution

omarfarid earned 2000 total points
ID: 40236312
try this

cd /my/path
for file in `ls filename*` # put your criteria in place of filename* e.g. myfile*.ext
    gzcat $file.fq.gz | ./reformat_sequence_data.rb > ./reformatsequence/$file.txt
    python qualityMask.py ./reformatsequence/$file.txt  ./qualitymask/$file.seq 20 1
   ./unique_seq_counts.rb ./qualitymask/$file.seq > ./uniquecounts/$file.counts.txt
   cut -f1 ./uniquecounts/$file.counts.txt > ./uniqueseq/$file.uniq.seq
   ./bowtie-1.1.0/bowtie -r -m1 -v1  ./bowtie-1.1.0/indexes/hg19 ./uniqueseq/$file.uniq.seq > ./alignedunique/$file.bowtie.txt

I assumed that you change to the dir where the files are and where your scripts and other folders are
LVL 48

Expert Comment

ID: 40237179
no need for the ls

for file in filename.*

Open in new window


Author Comment

ID: 40237238
so does

for file in filename.*

mean for every file in the directory I specify as that would be ideal
Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.


Author Comment

ID: 40237298
so ive got as far as the code below. However it stalls at the python bit. The error says:

Traceback (most recent call last):
  File "qualityMask.py", line 6, in <module>
    iFile = open(r'%s' % sys.argv[1], 'r')
IOError: [Errno 2] No such file or directory: './reformatsequence/SLX-8866.25%_A.000000000-A9AMY.s_1.r_1.fq.gz.txt.txt'
./unique_seq_counts.rb:3:in `each_line': Is a directory - ./qualitymask/ (Errno::EISDIR)
      from ./unique_seq_counts.rb:3:in `each'
      from ./unique_seq_counts.rb:3:in `<main>'

shopt -s extglob

for f in ./Zipped/*; do

gzcat $f | ./reformat_sequence_data.rb > $f.txt

file=$(basename $f)
python qualityMask.py ./reformatsequence/$file.txt  ./qualitymask/$file.seq 20 1

./unique_seq_counts.rb ./qualitymask/$fileseq > ./uniquecounts/$file.counts.txt

cut -f1 ./uniquecounts/$file.counts.txt > ./uniqueseq/$file.uniq.seq

./bowtie-1.1.0/bowtie -r -m1 -v1  ./bowtie-1.1.0/indexes/hg19 ./uniqueseq/$file.uniq.seq > ./alignedunique/$file.bowtie.txt


Open in new window

LVL 12

Expert Comment

ID: 40238120
Hi sebastizz,

Regarding your post #40237238:

> so does
> for file in filename.*
> mean for every file in the directory I specify as that would be ideal

Not quite.  It means every file starting with 'filename.'.  Change the word 'filename' to anything you like, or change the template to match the files you want to match (e.g. '*.gz' to match all files with a 'gz' extension).
LVL 48

Expert Comment

ID: 40238191
if you want all files do

for file in *

If you have subdirectories, then change to

for file in $(find . -maxdepth 1 -type f)

Featured Post

[Webinar] Cloud and Mobile-First Strategy

Maybe you’ve fully adopted the cloud since the beginning. Or maybe you started with on-prem resources but are pursuing a “cloud and mobile first” strategy. Getting to that end state has its challenges. Discover how to build out a 100% cloud and mobile IT strategy in this webinar.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Syslogd is a utility that traps and logs messages sent by running processes. It is configured with the syslog.conf file, which consists of lines containing a pair of fields: "the selector field which specifies the types of messages and priorities to…
Recently, an awarded photographer, Selina De Maeyer (http://www.selinademaeyer.com/), completed a photo shoot of a beautiful event (http://www.sintjacobantwerpen.be/verslag-en-fotoreportage-van-de-sacramentsprocessie-door-antwerpen#thumbnails) in An…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
This Micro Tutorial hows how you can integrate  Mac OSX to a Windows Active Directory Domain. Apple has made it easy to allow users to bind their macs to a windows domain with relative ease. The following video show how to bind OSX Mavericks to …

864 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question