Solved

Unix bash script for loop on sorted find results not working correctly (only 1 iteration)

Posted on 2013-06-07
9
453 Views
Last Modified: 2013-06-10
I have a bash script that archives files. The script can be called with no parameters, in which case it archives all algo data types, or it can be called with parameters, which are a space-delimited list of algo data types.

There are several algo data types: dynamic, arareports, session, staging, etc. The script iterates through the algo data types specified, or all the data types, calling a routine that archives the data, archiveAlgoData.

Each algo data type is handled slightly differently, but there is an archive part, then a cleanup part. There are two cleanup routines, cleanUpBaseKeep7 and cleanUpArchive.

I am finding that for one of my algo data types, the cleanUpArchive code does not work. Specifically, it does a find to get a list of directories in which a file is to be deleted, does a sort on that output, and hopes to iterate through the list of directories, of which there are three on my file systems.

The problem is that for all of the algo data types except one, it works perfectly, but for one, arareports, it does not. When it works correctly, it finds the three directories and iterates through them. When it does not work, it finds the three directories, but the for loop only makes one iteration, with that value being the three values contatenated with spaces or maybe end of line characters.

I will post the code and the log results. Here are the log results when it works:

2013/06/07 10:50:19.110938 +.011394 ----------------------------------------------------------------------------------------
2013/06/07 11:01:48.403787 +.------ ----------------------------------------------------------------------------------------
2013/06/07 11:01:48.410841 +.007054 ris_archive.sh script started
2013/06/07 11:01:48.420336 +.009495 There were 1 parameters, so archiving the following Algo Data Types: session
2013/06/07 11:01:48.429567 +.009231 Sesion Date found in /opt/ris/algo/staging/session_date.txt is [20130531]
2013/06/07 11:01:48.438421 +.008854 directory /opt/ris/archive already existed and did not need to be created
2013/06/07 11:01:48.451119 +.012698 directory /opt/ris/archive/checksums already existed and did not need to be created
2013/06/07 11:01:48.463438 +.012319 directory /opt/ris/archive/lastBackups already existed and did not need to be created
2013/06/07 11:01:48.471770 +.008332 ArchiveDirectory=/opt/ris/archive/20130531
2013/06/07 11:01:48.480614 +.008844 directory /opt/ris/archive/20130531 already existed and did not need to be created
2013/06/07 11:01:48.491084 +.010470 ................................................................................
2013/06/07 11:01:48.500049 +.008965 Archive Algo [session] Files
2013/06/07 11:01:48.510544 +.010495   Base Dir:                 /opt/ris/algo/algo_top/dynamic/riskwatch/sessionfiles
2013/06/07 11:01:48.519747 +.009203   Archive Dir:              /opt/ris/archive/20130531
2013/06/07 11:01:48.528610 +.008863   Files to Archive:         ARAM0002-rw.base.*
2013/06/07 11:01:48.539456 +.010846   Days to keep in archive:  365
2013/06/07 11:01:48.550538 +.011082 <algoFilesHaveChanged AlgoDataType=session; BaseDir=/opt/ris/algo/algo_top/dynamic/riskwatch/sessionfiles; ArchiveDir=/opt/ris/archive/20130531; FilesToArchive=ARAM0002-rw.base.*>
2013/06/07 11:01:48.571146 +.020608 .. MD5Filename /opt/ris/archive/checksums/session.md5 was found, so building new md5 to see if files have changed
2013/06/07 11:01:48.584677 +.013531 <createMd5> file /opt/ris/archive/checksums/session.md5.new for ARAM0002-rw.base.* files in folder /opt/ris/algo/algo_top/dynamic/riskwatch/sessionfiles
2013/06/07 11:01:48.725930 +.141253 </createMd5>
2013/06/07 11:01:48.741812 +.015882 .. Differences between the previous MD5 file and the new one:
2013/06/07 11:01:48.766814 +.025002 .. The MD5 files /opt/ris/archive/checksums/session.md5 and /opt/ris/archive/checksums/session.md5.new are the same, so no backup required, keeping the old MD5 file
2013/06/07 11:01:48.781681 +.242225 The files have not changed, so not doing a backup
2013/06/07 11:01:48.796506 +.014825 <cleanUpArchive AlgoDataType=session DaysToKeep=365 WillDeleteInFoldersBefore=/opt/ris/archive/20120531>
2013/06/07 11:01:48.829722 +.033216 .. # of files found: 1 filesFound=/opt/ris/archive/20110101
/opt/ris/archive/20130510
/opt/ris/archive/20130531
2013/06/07 11:01:48.838507 +.008785 .. considering folder /opt/ris/archive/20110101, compareFolderName=/opt/ris/archive/20120531
2013/06/07 11:01:48.847581 +.009074 .. the folder /opt/ris/archive/20110101 will have the data for session DELETED
2013/06/07 11:01:48.860596 +.013015 .. considering folder /opt/ris/archive/20130510, compareFolderName=/opt/ris/archive/20120531
2013/06/07 11:01:48.869919 +.009323 .. the folder will have the data for session saved
2013/06/07 11:01:48.881189 +.011270 .. considering folder /opt/ris/archive/20130531, compareFolderName=/opt/ris/archive/20120531
2013/06/07 11:01:48.889769 +.008580 .. the folder will have the data for session saved
2013/06/07 11:01:48.901479 +.011710 .. considered 3 folders
2013/06/07 11:01:48.915303 +.013824 </cleanUpArchive>
2013/06/07 11:01:48.994242 +.078939 ris_archive.sh script finished normally with MainScriptExitCode=0
2013/06/07 11:01:49.004416 +.010174 ----------------------------------------------------------------------------------------

Open in new window


Notice how it says it is considering a folder three times. This entry says that we found a list of files (# of files found:1, whic is really wrong, it should be three) but in any case, there is a list of files, which is iterated over:

2013/06/07 11:01:48.829722 +.033216 .. # of files found: 1 filesFound=/opt/ris/archive/20110101
/opt/ris/archive/20130510
/opt/ris/archive/20130531

Open in new window


Here are the iterations:
2013/06/07 11:01:48.838507 +.008785 .. considering folder /opt/ris/archive/20110101, compareFolderName=/opt/ris/archive/20120531
2013/06/07 11:01:48.847581 +.009074 .. the folder /opt/ris/archive/20110101 will have the data for session DELETED
2013/06/07 11:01:48.860596 +.013015 .. considering folder /opt/ris/archive/20130510, compareFolderName=/opt/ris/archive/20120531
2013/06/07 11:01:48.869919 +.009323 .. the folder will have the data for session saved
2013/06/07 11:01:48.881189 +.011270 .. considering folder /opt/ris/archive/20130531, compareFolderName=/opt/ris/archive/20120531
2013/06/07 11:01:48.889769 +.008580 .. the folder will have the data for session saved
2013/06/07 11:01:48.901479 +.011710 .. considered 3 folders

Open in new window


However, when it run it for arareports, it doesn't work:
2013/06/07 11:01:49.004416 +.010174 ----------------------------------------------------------------------------------------
2013/06/07 11:04:37.999404 +.------ ----------------------------------------------------------------------------------------
2013/06/07 11:04:38.008447 +.009043 ris_archive.sh script started
2013/06/07 11:04:38.019335 +.010888 There were 1 parameters, so archiving the following Algo Data Types: arareports
2013/06/07 11:04:38.028332 +.008997 Sesion Date found in /opt/ris/algo/staging/session_date.txt is [20130531]
2013/06/07 11:04:38.039389 +.011057 directory /opt/ris/archive already existed and did not need to be created
2013/06/07 11:04:38.048357 +.008968 directory /opt/ris/archive/checksums already existed and did not need to be created
2013/06/07 11:04:38.057404 +.009047 directory /opt/ris/archive/lastBackups already existed and did not need to be created
2013/06/07 11:04:38.066782 +.009378 ArchiveDirectory=/opt/ris/archive/20130531
2013/06/07 11:04:38.076416 +.009634 directory /opt/ris/archive/20130531 already existed and did not need to be created
2013/06/07 11:04:38.089855 +.013439 ................................................................................
2013/06/07 11:04:38.098357 +.008502 Archive Algo [arareports] Files
2013/06/07 11:04:38.106744 +.008387   Base Dir:                 /opt/ris/algo/ara2541/data_ara/report_results
2013/06/07 11:04:38.116584 +.009840   Archive Dir:              /opt/ris/archive/20130531
2013/06/07 11:04:38.125925 +.009341   Files to Archive:         *
2013/06/07 11:04:38.134379 +.008454   Days to keep in archive:  2555
2013/06/07 11:04:38.145035 +.010656 <algoFilesHaveChanged AlgoDataType=arareports; BaseDir=/opt/ris/algo/ara2541/data_ara/report_results; ArchiveDir=/opt/ris/archive/20130531; FilesToArchive=*>
2013/06/07 11:04:38.156781 +.011746 .. MD5Filename /opt/ris/archive/checksums/arareports.md5 was found, so building new md5 to see if files have changed
2013/06/07 11:04:38.167111 +.010330 <createMd5> file /opt/ris/archive/checksums/arareports.md5.new for * files in folder /opt/ris/algo/ara2541/data_ara/report_results
2013/06/07 11:04:38.912308 +.745197 </createMd5>
2013/06/07 11:04:38.921446 +.009138 .. Differences between the previous MD5 file and the new one:
2013/06/07 11:04:38.939741 +.018295 .. The MD5 files /opt/ris/archive/checksums/arareports.md5 and /opt/ris/archive/checksums/arareports.md5.new are the same, so no backup required, keeping the old MD5 file
2013/06/07 11:04:38.954457 +.820078 The files have not changed, so not doing a backup
2013/06/07 11:04:38.964953 +.010496 <deleteFilesOlderThan 90 /opt/ris/algo/ara2541/data_ara/report_results *>
2013/06/07 11:04:38.983513 +.018560 Deleted 0 files
2013/06/07 11:04:38.991348 +.007835 </deleteFilesOlderThan>
2013/06/07 11:04:39.004326 +.012978 <cleanUpArchive AlgoDataType=arareports DaysToKeep=2555 WillDeleteInFoldersBefore=/opt/ris/archive/20060602>
2013/06/07 11:04:39.031301 +.026975 .. # of files found: 1 filesFound=/opt/ris/archive/20110101
/opt/ris/archive/20130510
/opt/ris/archive/20130531
2013/06/07 11:04:39.040253 +.008952 .. considering folder /opt/ris/archive/20110101
/opt/ris/archive/20130510
/opt/ris/archive/20130531, compareFolderName=/opt/ris/archive/20060602
2013/06/07 11:04:39.049300 +.009047 .. the folder will have the data for arareports saved
2013/06/07 11:04:39.060310 +.011010 .. considered 1 folders
2013/06/07 11:04:39.076224 +.015914 </cleanUpArchive>
2013/06/07 11:04:39.121325 +.045101 ris_archive.sh script finished normally with MainScriptExitCode=0
2013/06/07 11:04:39.129524 +.008199 ----------------------------------------------------------------------------------------

Open in new window


Do you notice how, even though there were three files, it only iterated through the loop once?
2013/06/07 11:04:39.040253 +.008952 .. considering folder /opt/ris/archive/20110101
/opt/ris/archive/20130510
/opt/ris/archive/20130531, compareFolderName=/opt/ris/archive/20060602
2013/06/07 11:04:39.049300 +.009047 .. the folder will have the data for arareports saved
2013/06/07 11:04:39.060310 +.011010 .. considered 1 folders

Open in new window


I will attach my code. I am hoping that this rings a bell with Unix experts, because I have for days been trying to figure out what is different between arareports and the rest of algo data types.
ris-archive.sh
ris-archive-library.sh
20130531.log
20130531.windows.log
0
Comment
Question by:jkurant
  • 5
  • 2
  • 2
9 Comments
 

Author Comment

by:jkurant
Comment Utility
I wasn't done submitting my question...

Whenever I run this, it fails:
./ris_archive.sh arareports

Whenever I run any of these, it works:
./ris_archive.sh staging
./ris_archive.sh dynamic

When I run this, it runs for all algo data types, and this works for all except arareports:
./ris_archive.sh
0
 

Author Comment

by:jkurant
Comment Utility
To make it a bit easier, here is the routine that is not working consistently:

cleanUpArchive() {
# Parameters:                                                                                                                                               
#   $1 = AlgoDataType                                                                                                                                   
#   $2 = Days to keep                                                                                                                                    
# This will go through all of the dated archive folders, and if the dated archive folder is older than $DaysToKeep,       
# then we will delete the file $AlgoDataType.tgz                                                                                               
  startDate=`date +%Y%m%d -d "$PositionDate -$2 days"`
  compareFolderName=$ARCHIVE_DIR/$startDate

  writeLog "<cleanUpArchive AlgoDataType=$1 DaysToKeep=$2 WillDeleteInFoldersBefore=$compareFolderName>"
  find $ARCHIVE_DIR -maxdepth 1 -type d -regex "$ARCHIVE_DIR/20.*" > $ARCHIVE_DIR/files.txt
  filesFound=`sort /opt/ris/archive/files.txt`
  writeLog ".. # of files found: ${#filesFound[@]} filesFound=$filesFound"
  
  foldersDeleted=0
  foldersFound=0
  for dir in $filesFound; do
	writeLog ".. considering folder $dir, compareFolderName=$compareFolderName"
	#if the folder is before the compareFolderName, which is $2 days before $PositionDate, then we have to delete $1.tgz
	if [[ "$dir" < "$compareFolderName" ]]; then
	  writeLog ".. the folder $dir will have the data for $1 DELETED"
	  if [ -f $dir/$1.tgz ]; then
	    rm $dir/$1.tgz >>${LOG_DIR}/${PositionDate}.log 2>>${LOG_DIR}/${PositionDate}.log
		 if [ $? != 0 ]; then 
		  writeErr "cleanUpArchive: unable to remove the Algo Data Archive $dir/$1.tgz per the data-retention schedule"
		  return 1
		 fi
	  fi
	else
	  writeLog ".. the folder will have the data for $1 saved"
	fi
	foldersFound=`expr $foldersFound + 1`
  done
  writeLog ".. considered $foldersFound folders"

  rm $ARCHIVE_DIR/files.txt >>${LOG_DIR}/${PositionDate}.log 2>>${LOG_DIR}/${PositionDate}.log
  writeLog "</cleanUpArchive>"
}

Open in new window

0
 

Author Comment

by:jkurant
Comment Utility
I am baffled. I decided to just change the code in question, and see if that fixes the problem. So, I tried this new version of cleanUpArchive with a new find command, and I get the same results! It works for all except arareports.

cleanUpArchive() {
# Parameters:                                                                                                                                               
#   $1 = AlgoDataType                                                                                                                                   
#   $2 = Days to keep                                                                                                                                    
# This will go through all of the dated archive folders, and if the dated archive folder is older than $DaysToKeep,       
# then we will delete the file $AlgoDataType.tgz                                                                                               
  startDate=`date +%Y%m%d -d "$PositionDate -$2 days"`
  compareFolderName=$ARCHIVE_DIR/$startDate

  writeLog "<cleanUpArchive AlgoDataType=$1 DaysToKeep=$2 WillDeleteInFoldersBefore=$compareFolderName>"
  find $ARCHIVE_DIR -maxdepth 1 -type d -regex "$ARCHIVE_DIR/20.*" -printf %p@ | tr "@" "\n" | sort > $ARCHIVE_DIR/files.txt
   
  filesFound=`sort /opt/ris/archive/files.txt`
  writeLog ".. # of files found: ${#filesFound[@]}; filesFound=$filesFound"
  
  foldersDeleted=0
  foldersFound=0
  for dir in $filesFound; do
	writeLog ".. considering folder $dir, compareFolderName=$compareFolderName"
	#if the folder is before the compareFolderName, which is $2 days before $PositionDate, then we have to delete $1.tgz
	if [[ "$dir" < "$compareFolderName" ]]; then
	  writeLog ".. the folder $dir will have the data for $1 DELETED"
	  if [ -f $dir/$1.tgz ]; then
	    rm $dir/$1.tgz >>${LOG_DIR}/${PositionDate}.log 2>>${LOG_DIR}/${PositionDate}.log
		 if [ $? != 0 ]; then 
		  writeErr "cleanUpArchive: unable to remove the Algo Data Archive $dir/$1.tgz per the data-retention schedule"
		  return 1
		 fi
	  fi
	else
	  writeLog ".. the folder will have the data for $1 saved"
	fi
	foldersFound=`expr $foldersFound + 1`
  done
  writeLog ".. considered $foldersFound folders"

  rm $ARCHIVE_DIR/files.txt >>${LOG_DIR}/${PositionDate}.log 2>>${LOG_DIR}/${PositionDate}.log
  writeLog "</cleanUpArchive>"
}

Open in new window

0
 
LVL 26

Expert Comment

by:skullnobrains
Comment Utility
what is the output of the find command alone ?
what are the access rights on the directories ?
can you confirm there are no spaces in the path ?
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 37

Accepted Solution

by:
Gerwin Jansen earned 500 total points
Comment Utility
I see that you use different ways of referring to variables' contents:

$ARCHIVE_DIR
$MD5Filename.new
${LOG_DIR}
${PositionDate}.log

I'd change all variables that don't have accolades {} to include them to prevent issues.

For debugging purposes, start your script with the -xv option to see debug logging and find out where things go wrong .

Btw: you have a lot of 'double' redirections like:

>>${LOG_DIR}/${PositionDate}.log 2>>${LOG_DIR}/${PositionDate}.log

You can simplify like this:

>>${LOG_DIR}/${PositionDate}.log 2&1

This will redirect stdout to the logfile and add stderror to the same.
0
 

Author Closing Comment

by:jkurant
Comment Utility
This isn't really the answer to the problem, but did teach me something new, so I will have to accept it as the answer.

However, I have determined the actual answer myself, which is that I need to add | tr '\n' ' ' to convert the end-of-line characters to spaces. I don't know why this is necessary, though, because I have seen many examples online of taking the output of a find command and iterating over that with a for statement. It only worked intermittenly in my case, and I never found out what was making it work one time and not the other.
0
 
LVL 37

Expert Comment

by:Gerwin Jansen
Comment Utility
Thanks, a minor correction:

>>${LOG_DIR}/${PositionDate}.log 2>&1

BTW: You don't have to accept a comment as an answer, you can wait a bit for other experts to respond or ask attention via a moderator using the Request Attention button above (you can still do that now if you like).
0
 

Author Comment

by:jkurant
Comment Utility
Thank you, germinjansen. I have the answer I need now, so I will leave the solution as is.
0
 
LVL 26

Expert Comment

by:skullnobrains
Comment Utility
i do not care about points here, but beware when using the tr method in this way : if any of the files has a space inside it, the script will mistaken it for 2 files. also if the find command outputs too many files (more than 4096, i guess in a recent bash), your script will not work.

many ways past this, but a simple one is to use a while loop

find ... | while read file
do
    ...
done

find itself also supports to -exec other commands iteratively, and also has sorting features in many implementations
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Attention: This article will no longer be maintained. If you have any questions, please feel free to mail me. jgh@FreeBSD.org Please see http://www.freebsd.org/doc/en_US.ISO8859-1/articles/freebsd-update-server/ for the updated article. It is avail…
Utilizing an array to gracefully append to a list of EmailAddresses
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
In a previous video, we went over how to export a DynamoDB table into Amazon S3.  In this video, we show how to load the export from S3 into a DynamoDB table.

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now