Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 478
  • Last Modified:

Unix bash script for loop on sorted find results not working correctly (only 1 iteration)

I have a bash script that archives files. The script can be called with no parameters, in which case it archives all algo data types, or it can be called with parameters, which are a space-delimited list of algo data types.

There are several algo data types: dynamic, arareports, session, staging, etc. The script iterates through the algo data types specified, or all the data types, calling a routine that archives the data, archiveAlgoData.

Each algo data type is handled slightly differently, but there is an archive part, then a cleanup part. There are two cleanup routines, cleanUpBaseKeep7 and cleanUpArchive.

I am finding that for one of my algo data types, the cleanUpArchive code does not work. Specifically, it does a find to get a list of directories in which a file is to be deleted, does a sort on that output, and hopes to iterate through the list of directories, of which there are three on my file systems.

The problem is that for all of the algo data types except one, it works perfectly, but for one, arareports, it does not. When it works correctly, it finds the three directories and iterates through them. When it does not work, it finds the three directories, but the for loop only makes one iteration, with that value being the three values contatenated with spaces or maybe end of line characters.

I will post the code and the log results. Here are the log results when it works:

2013/06/07 10:50:19.110938 +.011394 ----------------------------------------------------------------------------------------
2013/06/07 11:01:48.403787 +.------ ----------------------------------------------------------------------------------------
2013/06/07 11:01:48.410841 +.007054 ris_archive.sh script started
2013/06/07 11:01:48.420336 +.009495 There were 1 parameters, so archiving the following Algo Data Types: session
2013/06/07 11:01:48.429567 +.009231 Sesion Date found in /opt/ris/algo/staging/session_date.txt is [20130531]
2013/06/07 11:01:48.438421 +.008854 directory /opt/ris/archive already existed and did not need to be created
2013/06/07 11:01:48.451119 +.012698 directory /opt/ris/archive/checksums already existed and did not need to be created
2013/06/07 11:01:48.463438 +.012319 directory /opt/ris/archive/lastBackups already existed and did not need to be created
2013/06/07 11:01:48.471770 +.008332 ArchiveDirectory=/opt/ris/archive/20130531
2013/06/07 11:01:48.480614 +.008844 directory /opt/ris/archive/20130531 already existed and did not need to be created
2013/06/07 11:01:48.491084 +.010470 ................................................................................
2013/06/07 11:01:48.500049 +.008965 Archive Algo [session] Files
2013/06/07 11:01:48.510544 +.010495   Base Dir:                 /opt/ris/algo/algo_top/dynamic/riskwatch/sessionfiles
2013/06/07 11:01:48.519747 +.009203   Archive Dir:              /opt/ris/archive/20130531
2013/06/07 11:01:48.528610 +.008863   Files to Archive:         ARAM0002-rw.base.*
2013/06/07 11:01:48.539456 +.010846   Days to keep in archive:  365
2013/06/07 11:01:48.550538 +.011082 <algoFilesHaveChanged AlgoDataType=session; BaseDir=/opt/ris/algo/algo_top/dynamic/riskwatch/sessionfiles; ArchiveDir=/opt/ris/archive/20130531; FilesToArchive=ARAM0002-rw.base.*>
2013/06/07 11:01:48.571146 +.020608 .. MD5Filename /opt/ris/archive/checksums/session.md5 was found, so building new md5 to see if files have changed
2013/06/07 11:01:48.584677 +.013531 <createMd5> file /opt/ris/archive/checksums/session.md5.new for ARAM0002-rw.base.* files in folder /opt/ris/algo/algo_top/dynamic/riskwatch/sessionfiles
2013/06/07 11:01:48.725930 +.141253 </createMd5>
2013/06/07 11:01:48.741812 +.015882 .. Differences between the previous MD5 file and the new one:
2013/06/07 11:01:48.766814 +.025002 .. The MD5 files /opt/ris/archive/checksums/session.md5 and /opt/ris/archive/checksums/session.md5.new are the same, so no backup required, keeping the old MD5 file
2013/06/07 11:01:48.781681 +.242225 The files have not changed, so not doing a backup
2013/06/07 11:01:48.796506 +.014825 <cleanUpArchive AlgoDataType=session DaysToKeep=365 WillDeleteInFoldersBefore=/opt/ris/archive/20120531>
2013/06/07 11:01:48.829722 +.033216 .. # of files found: 1 filesFound=/opt/ris/archive/20110101
/opt/ris/archive/20130510
/opt/ris/archive/20130531
2013/06/07 11:01:48.838507 +.008785 .. considering folder /opt/ris/archive/20110101, compareFolderName=/opt/ris/archive/20120531
2013/06/07 11:01:48.847581 +.009074 .. the folder /opt/ris/archive/20110101 will have the data for session DELETED
2013/06/07 11:01:48.860596 +.013015 .. considering folder /opt/ris/archive/20130510, compareFolderName=/opt/ris/archive/20120531
2013/06/07 11:01:48.869919 +.009323 .. the folder will have the data for session saved
2013/06/07 11:01:48.881189 +.011270 .. considering folder /opt/ris/archive/20130531, compareFolderName=/opt/ris/archive/20120531
2013/06/07 11:01:48.889769 +.008580 .. the folder will have the data for session saved
2013/06/07 11:01:48.901479 +.011710 .. considered 3 folders
2013/06/07 11:01:48.915303 +.013824 </cleanUpArchive>
2013/06/07 11:01:48.994242 +.078939 ris_archive.sh script finished normally with MainScriptExitCode=0
2013/06/07 11:01:49.004416 +.010174 ----------------------------------------------------------------------------------------

Open in new window


Notice how it says it is considering a folder three times. This entry says that we found a list of files (# of files found:1, whic is really wrong, it should be three) but in any case, there is a list of files, which is iterated over:

2013/06/07 11:01:48.829722 +.033216 .. # of files found: 1 filesFound=/opt/ris/archive/20110101
/opt/ris/archive/20130510
/opt/ris/archive/20130531

Open in new window


Here are the iterations:
2013/06/07 11:01:48.838507 +.008785 .. considering folder /opt/ris/archive/20110101, compareFolderName=/opt/ris/archive/20120531
2013/06/07 11:01:48.847581 +.009074 .. the folder /opt/ris/archive/20110101 will have the data for session DELETED
2013/06/07 11:01:48.860596 +.013015 .. considering folder /opt/ris/archive/20130510, compareFolderName=/opt/ris/archive/20120531
2013/06/07 11:01:48.869919 +.009323 .. the folder will have the data for session saved
2013/06/07 11:01:48.881189 +.011270 .. considering folder /opt/ris/archive/20130531, compareFolderName=/opt/ris/archive/20120531
2013/06/07 11:01:48.889769 +.008580 .. the folder will have the data for session saved
2013/06/07 11:01:48.901479 +.011710 .. considered 3 folders

Open in new window


However, when it run it for arareports, it doesn't work:
2013/06/07 11:01:49.004416 +.010174 ----------------------------------------------------------------------------------------
2013/06/07 11:04:37.999404 +.------ ----------------------------------------------------------------------------------------
2013/06/07 11:04:38.008447 +.009043 ris_archive.sh script started
2013/06/07 11:04:38.019335 +.010888 There were 1 parameters, so archiving the following Algo Data Types: arareports
2013/06/07 11:04:38.028332 +.008997 Sesion Date found in /opt/ris/algo/staging/session_date.txt is [20130531]
2013/06/07 11:04:38.039389 +.011057 directory /opt/ris/archive already existed and did not need to be created
2013/06/07 11:04:38.048357 +.008968 directory /opt/ris/archive/checksums already existed and did not need to be created
2013/06/07 11:04:38.057404 +.009047 directory /opt/ris/archive/lastBackups already existed and did not need to be created
2013/06/07 11:04:38.066782 +.009378 ArchiveDirectory=/opt/ris/archive/20130531
2013/06/07 11:04:38.076416 +.009634 directory /opt/ris/archive/20130531 already existed and did not need to be created
2013/06/07 11:04:38.089855 +.013439 ................................................................................
2013/06/07 11:04:38.098357 +.008502 Archive Algo [arareports] Files
2013/06/07 11:04:38.106744 +.008387   Base Dir:                 /opt/ris/algo/ara2541/data_ara/report_results
2013/06/07 11:04:38.116584 +.009840   Archive Dir:              /opt/ris/archive/20130531
2013/06/07 11:04:38.125925 +.009341   Files to Archive:         *
2013/06/07 11:04:38.134379 +.008454   Days to keep in archive:  2555
2013/06/07 11:04:38.145035 +.010656 <algoFilesHaveChanged AlgoDataType=arareports; BaseDir=/opt/ris/algo/ara2541/data_ara/report_results; ArchiveDir=/opt/ris/archive/20130531; FilesToArchive=*>
2013/06/07 11:04:38.156781 +.011746 .. MD5Filename /opt/ris/archive/checksums/arareports.md5 was found, so building new md5 to see if files have changed
2013/06/07 11:04:38.167111 +.010330 <createMd5> file /opt/ris/archive/checksums/arareports.md5.new for * files in folder /opt/ris/algo/ara2541/data_ara/report_results
2013/06/07 11:04:38.912308 +.745197 </createMd5>
2013/06/07 11:04:38.921446 +.009138 .. Differences between the previous MD5 file and the new one:
2013/06/07 11:04:38.939741 +.018295 .. The MD5 files /opt/ris/archive/checksums/arareports.md5 and /opt/ris/archive/checksums/arareports.md5.new are the same, so no backup required, keeping the old MD5 file
2013/06/07 11:04:38.954457 +.820078 The files have not changed, so not doing a backup
2013/06/07 11:04:38.964953 +.010496 <deleteFilesOlderThan 90 /opt/ris/algo/ara2541/data_ara/report_results *>
2013/06/07 11:04:38.983513 +.018560 Deleted 0 files
2013/06/07 11:04:38.991348 +.007835 </deleteFilesOlderThan>
2013/06/07 11:04:39.004326 +.012978 <cleanUpArchive AlgoDataType=arareports DaysToKeep=2555 WillDeleteInFoldersBefore=/opt/ris/archive/20060602>
2013/06/07 11:04:39.031301 +.026975 .. # of files found: 1 filesFound=/opt/ris/archive/20110101
/opt/ris/archive/20130510
/opt/ris/archive/20130531
2013/06/07 11:04:39.040253 +.008952 .. considering folder /opt/ris/archive/20110101
/opt/ris/archive/20130510
/opt/ris/archive/20130531, compareFolderName=/opt/ris/archive/20060602
2013/06/07 11:04:39.049300 +.009047 .. the folder will have the data for arareports saved
2013/06/07 11:04:39.060310 +.011010 .. considered 1 folders
2013/06/07 11:04:39.076224 +.015914 </cleanUpArchive>
2013/06/07 11:04:39.121325 +.045101 ris_archive.sh script finished normally with MainScriptExitCode=0
2013/06/07 11:04:39.129524 +.008199 ----------------------------------------------------------------------------------------

Open in new window


Do you notice how, even though there were three files, it only iterated through the loop once?
2013/06/07 11:04:39.040253 +.008952 .. considering folder /opt/ris/archive/20110101
/opt/ris/archive/20130510
/opt/ris/archive/20130531, compareFolderName=/opt/ris/archive/20060602
2013/06/07 11:04:39.049300 +.009047 .. the folder will have the data for arareports saved
2013/06/07 11:04:39.060310 +.011010 .. considered 1 folders

Open in new window


I will attach my code. I am hoping that this rings a bell with Unix experts, because I have for days been trying to figure out what is different between arareports and the rest of algo data types.
ris-archive.sh
ris-archive-library.sh
20130531.log
20130531.windows.log
0
jkurant
Asked:
jkurant
  • 5
  • 2
  • 2
1 Solution
 
jkurantAuthor Commented:
I wasn't done submitting my question...

Whenever I run this, it fails:
./ris_archive.sh arareports

Whenever I run any of these, it works:
./ris_archive.sh staging
./ris_archive.sh dynamic

When I run this, it runs for all algo data types, and this works for all except arareports:
./ris_archive.sh
0
 
jkurantAuthor Commented:
To make it a bit easier, here is the routine that is not working consistently:

cleanUpArchive() {
# Parameters:                                                                                                                                               
#   $1 = AlgoDataType                                                                                                                                   
#   $2 = Days to keep                                                                                                                                    
# This will go through all of the dated archive folders, and if the dated archive folder is older than $DaysToKeep,       
# then we will delete the file $AlgoDataType.tgz                                                                                               
  startDate=`date +%Y%m%d -d "$PositionDate -$2 days"`
  compareFolderName=$ARCHIVE_DIR/$startDate

  writeLog "<cleanUpArchive AlgoDataType=$1 DaysToKeep=$2 WillDeleteInFoldersBefore=$compareFolderName>"
  find $ARCHIVE_DIR -maxdepth 1 -type d -regex "$ARCHIVE_DIR/20.*" > $ARCHIVE_DIR/files.txt
  filesFound=`sort /opt/ris/archive/files.txt`
  writeLog ".. # of files found: ${#filesFound[@]} filesFound=$filesFound"
  
  foldersDeleted=0
  foldersFound=0
  for dir in $filesFound; do
	writeLog ".. considering folder $dir, compareFolderName=$compareFolderName"
	#if the folder is before the compareFolderName, which is $2 days before $PositionDate, then we have to delete $1.tgz
	if [[ "$dir" < "$compareFolderName" ]]; then
	  writeLog ".. the folder $dir will have the data for $1 DELETED"
	  if [ -f $dir/$1.tgz ]; then
	    rm $dir/$1.tgz >>${LOG_DIR}/${PositionDate}.log 2>>${LOG_DIR}/${PositionDate}.log
		 if [ $? != 0 ]; then 
		  writeErr "cleanUpArchive: unable to remove the Algo Data Archive $dir/$1.tgz per the data-retention schedule"
		  return 1
		 fi
	  fi
	else
	  writeLog ".. the folder will have the data for $1 saved"
	fi
	foldersFound=`expr $foldersFound + 1`
  done
  writeLog ".. considered $foldersFound folders"

  rm $ARCHIVE_DIR/files.txt >>${LOG_DIR}/${PositionDate}.log 2>>${LOG_DIR}/${PositionDate}.log
  writeLog "</cleanUpArchive>"
}

Open in new window

0
 
jkurantAuthor Commented:
I am baffled. I decided to just change the code in question, and see if that fixes the problem. So, I tried this new version of cleanUpArchive with a new find command, and I get the same results! It works for all except arareports.

cleanUpArchive() {
# Parameters:                                                                                                                                               
#   $1 = AlgoDataType                                                                                                                                   
#   $2 = Days to keep                                                                                                                                    
# This will go through all of the dated archive folders, and if the dated archive folder is older than $DaysToKeep,       
# then we will delete the file $AlgoDataType.tgz                                                                                               
  startDate=`date +%Y%m%d -d "$PositionDate -$2 days"`
  compareFolderName=$ARCHIVE_DIR/$startDate

  writeLog "<cleanUpArchive AlgoDataType=$1 DaysToKeep=$2 WillDeleteInFoldersBefore=$compareFolderName>"
  find $ARCHIVE_DIR -maxdepth 1 -type d -regex "$ARCHIVE_DIR/20.*" -printf %p@ | tr "@" "\n" | sort > $ARCHIVE_DIR/files.txt
   
  filesFound=`sort /opt/ris/archive/files.txt`
  writeLog ".. # of files found: ${#filesFound[@]}; filesFound=$filesFound"
  
  foldersDeleted=0
  foldersFound=0
  for dir in $filesFound; do
	writeLog ".. considering folder $dir, compareFolderName=$compareFolderName"
	#if the folder is before the compareFolderName, which is $2 days before $PositionDate, then we have to delete $1.tgz
	if [[ "$dir" < "$compareFolderName" ]]; then
	  writeLog ".. the folder $dir will have the data for $1 DELETED"
	  if [ -f $dir/$1.tgz ]; then
	    rm $dir/$1.tgz >>${LOG_DIR}/${PositionDate}.log 2>>${LOG_DIR}/${PositionDate}.log
		 if [ $? != 0 ]; then 
		  writeErr "cleanUpArchive: unable to remove the Algo Data Archive $dir/$1.tgz per the data-retention schedule"
		  return 1
		 fi
	  fi
	else
	  writeLog ".. the folder will have the data for $1 saved"
	fi
	foldersFound=`expr $foldersFound + 1`
  done
  writeLog ".. considered $foldersFound folders"

  rm $ARCHIVE_DIR/files.txt >>${LOG_DIR}/${PositionDate}.log 2>>${LOG_DIR}/${PositionDate}.log
  writeLog "</cleanUpArchive>"
}

Open in new window

0
Veeam and MySQL: How to Perform Backup & Recovery

MySQL and the MariaDB variant are among the most used databases in Linux environments, and many critical applications support their data on them. Watch this recorded webinar to find out how Veeam Backup & Replication allows you to get consistent backups of MySQL databases.

 
skullnobrainsCommented:
what is the output of the find command alone ?
what are the access rights on the directories ?
can you confirm there are no spaces in the path ?
0
 
Gerwin Jansen, EE MVETopic Advisor Commented:
I see that you use different ways of referring to variables' contents:

$ARCHIVE_DIR
$MD5Filename.new
${LOG_DIR}
${PositionDate}.log

I'd change all variables that don't have accolades {} to include them to prevent issues.

For debugging purposes, start your script with the -xv option to see debug logging and find out where things go wrong .

Btw: you have a lot of 'double' redirections like:

>>${LOG_DIR}/${PositionDate}.log 2>>${LOG_DIR}/${PositionDate}.log

You can simplify like this:

>>${LOG_DIR}/${PositionDate}.log 2&1

This will redirect stdout to the logfile and add stderror to the same.
0
 
jkurantAuthor Commented:
This isn't really the answer to the problem, but did teach me something new, so I will have to accept it as the answer.

However, I have determined the actual answer myself, which is that I need to add | tr '\n' ' ' to convert the end-of-line characters to spaces. I don't know why this is necessary, though, because I have seen many examples online of taking the output of a find command and iterating over that with a for statement. It only worked intermittenly in my case, and I never found out what was making it work one time and not the other.
0
 
Gerwin Jansen, EE MVETopic Advisor Commented:
Thanks, a minor correction:

>>${LOG_DIR}/${PositionDate}.log 2>&1

BTW: You don't have to accept a comment as an answer, you can wait a bit for other experts to respond or ask attention via a moderator using the Request Attention button above (you can still do that now if you like).
0
 
jkurantAuthor Commented:
Thank you, germinjansen. I have the answer I need now, so I will leave the solution as is.
0
 
skullnobrainsCommented:
i do not care about points here, but beware when using the tr method in this way : if any of the files has a space inside it, the script will mistaken it for 2 files. also if the find command outputs too many files (more than 4096, i guess in a recent bash), your script will not work.

many ways past this, but a simple one is to use a while loop

find ... | while read file
do
    ...
done

find itself also supports to -exec other commands iteratively, and also has sorting features in many implementations
0

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

  • 5
  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now