Solved

Unix bash script for loop on sorted find results not working correctly (only 1 iteration)

Posted on 2013-06-07
9
471 Views
Last Modified: 2013-06-10
I have a bash script that archives files. The script can be called with no parameters, in which case it archives all algo data types, or it can be called with parameters, which are a space-delimited list of algo data types.

There are several algo data types: dynamic, arareports, session, staging, etc. The script iterates through the algo data types specified, or all the data types, calling a routine that archives the data, archiveAlgoData.

Each algo data type is handled slightly differently, but there is an archive part, then a cleanup part. There are two cleanup routines, cleanUpBaseKeep7 and cleanUpArchive.

I am finding that for one of my algo data types, the cleanUpArchive code does not work. Specifically, it does a find to get a list of directories in which a file is to be deleted, does a sort on that output, and hopes to iterate through the list of directories, of which there are three on my file systems.

The problem is that for all of the algo data types except one, it works perfectly, but for one, arareports, it does not. When it works correctly, it finds the three directories and iterates through them. When it does not work, it finds the three directories, but the for loop only makes one iteration, with that value being the three values contatenated with spaces or maybe end of line characters.

I will post the code and the log results. Here are the log results when it works:

2013/06/07 10:50:19.110938 +.011394 ----------------------------------------------------------------------------------------
2013/06/07 11:01:48.403787 +.------ ----------------------------------------------------------------------------------------
2013/06/07 11:01:48.410841 +.007054 ris_archive.sh script started
2013/06/07 11:01:48.420336 +.009495 There were 1 parameters, so archiving the following Algo Data Types: session
2013/06/07 11:01:48.429567 +.009231 Sesion Date found in /opt/ris/algo/staging/session_date.txt is [20130531]
2013/06/07 11:01:48.438421 +.008854 directory /opt/ris/archive already existed and did not need to be created
2013/06/07 11:01:48.451119 +.012698 directory /opt/ris/archive/checksums already existed and did not need to be created
2013/06/07 11:01:48.463438 +.012319 directory /opt/ris/archive/lastBackups already existed and did not need to be created
2013/06/07 11:01:48.471770 +.008332 ArchiveDirectory=/opt/ris/archive/20130531
2013/06/07 11:01:48.480614 +.008844 directory /opt/ris/archive/20130531 already existed and did not need to be created
2013/06/07 11:01:48.491084 +.010470 ................................................................................
2013/06/07 11:01:48.500049 +.008965 Archive Algo [session] Files
2013/06/07 11:01:48.510544 +.010495   Base Dir:                 /opt/ris/algo/algo_top/dynamic/riskwatch/sessionfiles
2013/06/07 11:01:48.519747 +.009203   Archive Dir:              /opt/ris/archive/20130531
2013/06/07 11:01:48.528610 +.008863   Files to Archive:         ARAM0002-rw.base.*
2013/06/07 11:01:48.539456 +.010846   Days to keep in archive:  365
2013/06/07 11:01:48.550538 +.011082 <algoFilesHaveChanged AlgoDataType=session; BaseDir=/opt/ris/algo/algo_top/dynamic/riskwatch/sessionfiles; ArchiveDir=/opt/ris/archive/20130531; FilesToArchive=ARAM0002-rw.base.*>
2013/06/07 11:01:48.571146 +.020608 .. MD5Filename /opt/ris/archive/checksums/session.md5 was found, so building new md5 to see if files have changed
2013/06/07 11:01:48.584677 +.013531 <createMd5> file /opt/ris/archive/checksums/session.md5.new for ARAM0002-rw.base.* files in folder /opt/ris/algo/algo_top/dynamic/riskwatch/sessionfiles
2013/06/07 11:01:48.725930 +.141253 </createMd5>
2013/06/07 11:01:48.741812 +.015882 .. Differences between the previous MD5 file and the new one:
2013/06/07 11:01:48.766814 +.025002 .. The MD5 files /opt/ris/archive/checksums/session.md5 and /opt/ris/archive/checksums/session.md5.new are the same, so no backup required, keeping the old MD5 file
2013/06/07 11:01:48.781681 +.242225 The files have not changed, so not doing a backup
2013/06/07 11:01:48.796506 +.014825 <cleanUpArchive AlgoDataType=session DaysToKeep=365 WillDeleteInFoldersBefore=/opt/ris/archive/20120531>
2013/06/07 11:01:48.829722 +.033216 .. # of files found: 1 filesFound=/opt/ris/archive/20110101
/opt/ris/archive/20130510
/opt/ris/archive/20130531
2013/06/07 11:01:48.838507 +.008785 .. considering folder /opt/ris/archive/20110101, compareFolderName=/opt/ris/archive/20120531
2013/06/07 11:01:48.847581 +.009074 .. the folder /opt/ris/archive/20110101 will have the data for session DELETED
2013/06/07 11:01:48.860596 +.013015 .. considering folder /opt/ris/archive/20130510, compareFolderName=/opt/ris/archive/20120531
2013/06/07 11:01:48.869919 +.009323 .. the folder will have the data for session saved
2013/06/07 11:01:48.881189 +.011270 .. considering folder /opt/ris/archive/20130531, compareFolderName=/opt/ris/archive/20120531
2013/06/07 11:01:48.889769 +.008580 .. the folder will have the data for session saved
2013/06/07 11:01:48.901479 +.011710 .. considered 3 folders
2013/06/07 11:01:48.915303 +.013824 </cleanUpArchive>
2013/06/07 11:01:48.994242 +.078939 ris_archive.sh script finished normally with MainScriptExitCode=0
2013/06/07 11:01:49.004416 +.010174 ----------------------------------------------------------------------------------------

Open in new window


Notice how it says it is considering a folder three times. This entry says that we found a list of files (# of files found:1, whic is really wrong, it should be three) but in any case, there is a list of files, which is iterated over:

2013/06/07 11:01:48.829722 +.033216 .. # of files found: 1 filesFound=/opt/ris/archive/20110101
/opt/ris/archive/20130510
/opt/ris/archive/20130531

Open in new window


Here are the iterations:
2013/06/07 11:01:48.838507 +.008785 .. considering folder /opt/ris/archive/20110101, compareFolderName=/opt/ris/archive/20120531
2013/06/07 11:01:48.847581 +.009074 .. the folder /opt/ris/archive/20110101 will have the data for session DELETED
2013/06/07 11:01:48.860596 +.013015 .. considering folder /opt/ris/archive/20130510, compareFolderName=/opt/ris/archive/20120531
2013/06/07 11:01:48.869919 +.009323 .. the folder will have the data for session saved
2013/06/07 11:01:48.881189 +.011270 .. considering folder /opt/ris/archive/20130531, compareFolderName=/opt/ris/archive/20120531
2013/06/07 11:01:48.889769 +.008580 .. the folder will have the data for session saved
2013/06/07 11:01:48.901479 +.011710 .. considered 3 folders

Open in new window


However, when it run it for arareports, it doesn't work:
2013/06/07 11:01:49.004416 +.010174 ----------------------------------------------------------------------------------------
2013/06/07 11:04:37.999404 +.------ ----------------------------------------------------------------------------------------
2013/06/07 11:04:38.008447 +.009043 ris_archive.sh script started
2013/06/07 11:04:38.019335 +.010888 There were 1 parameters, so archiving the following Algo Data Types: arareports
2013/06/07 11:04:38.028332 +.008997 Sesion Date found in /opt/ris/algo/staging/session_date.txt is [20130531]
2013/06/07 11:04:38.039389 +.011057 directory /opt/ris/archive already existed and did not need to be created
2013/06/07 11:04:38.048357 +.008968 directory /opt/ris/archive/checksums already existed and did not need to be created
2013/06/07 11:04:38.057404 +.009047 directory /opt/ris/archive/lastBackups already existed and did not need to be created
2013/06/07 11:04:38.066782 +.009378 ArchiveDirectory=/opt/ris/archive/20130531
2013/06/07 11:04:38.076416 +.009634 directory /opt/ris/archive/20130531 already existed and did not need to be created
2013/06/07 11:04:38.089855 +.013439 ................................................................................
2013/06/07 11:04:38.098357 +.008502 Archive Algo [arareports] Files
2013/06/07 11:04:38.106744 +.008387   Base Dir:                 /opt/ris/algo/ara2541/data_ara/report_results
2013/06/07 11:04:38.116584 +.009840   Archive Dir:              /opt/ris/archive/20130531
2013/06/07 11:04:38.125925 +.009341   Files to Archive:         *
2013/06/07 11:04:38.134379 +.008454   Days to keep in archive:  2555
2013/06/07 11:04:38.145035 +.010656 <algoFilesHaveChanged AlgoDataType=arareports; BaseDir=/opt/ris/algo/ara2541/data_ara/report_results; ArchiveDir=/opt/ris/archive/20130531; FilesToArchive=*>
2013/06/07 11:04:38.156781 +.011746 .. MD5Filename /opt/ris/archive/checksums/arareports.md5 was found, so building new md5 to see if files have changed
2013/06/07 11:04:38.167111 +.010330 <createMd5> file /opt/ris/archive/checksums/arareports.md5.new for * files in folder /opt/ris/algo/ara2541/data_ara/report_results
2013/06/07 11:04:38.912308 +.745197 </createMd5>
2013/06/07 11:04:38.921446 +.009138 .. Differences between the previous MD5 file and the new one:
2013/06/07 11:04:38.939741 +.018295 .. The MD5 files /opt/ris/archive/checksums/arareports.md5 and /opt/ris/archive/checksums/arareports.md5.new are the same, so no backup required, keeping the old MD5 file
2013/06/07 11:04:38.954457 +.820078 The files have not changed, so not doing a backup
2013/06/07 11:04:38.964953 +.010496 <deleteFilesOlderThan 90 /opt/ris/algo/ara2541/data_ara/report_results *>
2013/06/07 11:04:38.983513 +.018560 Deleted 0 files
2013/06/07 11:04:38.991348 +.007835 </deleteFilesOlderThan>
2013/06/07 11:04:39.004326 +.012978 <cleanUpArchive AlgoDataType=arareports DaysToKeep=2555 WillDeleteInFoldersBefore=/opt/ris/archive/20060602>
2013/06/07 11:04:39.031301 +.026975 .. # of files found: 1 filesFound=/opt/ris/archive/20110101
/opt/ris/archive/20130510
/opt/ris/archive/20130531
2013/06/07 11:04:39.040253 +.008952 .. considering folder /opt/ris/archive/20110101
/opt/ris/archive/20130510
/opt/ris/archive/20130531, compareFolderName=/opt/ris/archive/20060602
2013/06/07 11:04:39.049300 +.009047 .. the folder will have the data for arareports saved
2013/06/07 11:04:39.060310 +.011010 .. considered 1 folders
2013/06/07 11:04:39.076224 +.015914 </cleanUpArchive>
2013/06/07 11:04:39.121325 +.045101 ris_archive.sh script finished normally with MainScriptExitCode=0
2013/06/07 11:04:39.129524 +.008199 ----------------------------------------------------------------------------------------

Open in new window


Do you notice how, even though there were three files, it only iterated through the loop once?
2013/06/07 11:04:39.040253 +.008952 .. considering folder /opt/ris/archive/20110101
/opt/ris/archive/20130510
/opt/ris/archive/20130531, compareFolderName=/opt/ris/archive/20060602
2013/06/07 11:04:39.049300 +.009047 .. the folder will have the data for arareports saved
2013/06/07 11:04:39.060310 +.011010 .. considered 1 folders

Open in new window


I will attach my code. I am hoping that this rings a bell with Unix experts, because I have for days been trying to figure out what is different between arareports and the rest of algo data types.
ris-archive.sh
ris-archive-library.sh
20130531.log
20130531.windows.log
0
Comment
Question by:jkurant
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 2
  • 2
9 Comments
 

Author Comment

by:jkurant
ID: 39229347
I wasn't done submitting my question...

Whenever I run this, it fails:
./ris_archive.sh arareports

Whenever I run any of these, it works:
./ris_archive.sh staging
./ris_archive.sh dynamic

When I run this, it runs for all algo data types, and this works for all except arareports:
./ris_archive.sh
0
 

Author Comment

by:jkurant
ID: 39229418
To make it a bit easier, here is the routine that is not working consistently:

cleanUpArchive() {
# Parameters:                                                                                                                                               
#   $1 = AlgoDataType                                                                                                                                   
#   $2 = Days to keep                                                                                                                                    
# This will go through all of the dated archive folders, and if the dated archive folder is older than $DaysToKeep,       
# then we will delete the file $AlgoDataType.tgz                                                                                               
  startDate=`date +%Y%m%d -d "$PositionDate -$2 days"`
  compareFolderName=$ARCHIVE_DIR/$startDate

  writeLog "<cleanUpArchive AlgoDataType=$1 DaysToKeep=$2 WillDeleteInFoldersBefore=$compareFolderName>"
  find $ARCHIVE_DIR -maxdepth 1 -type d -regex "$ARCHIVE_DIR/20.*" > $ARCHIVE_DIR/files.txt
  filesFound=`sort /opt/ris/archive/files.txt`
  writeLog ".. # of files found: ${#filesFound[@]} filesFound=$filesFound"
  
  foldersDeleted=0
  foldersFound=0
  for dir in $filesFound; do
	writeLog ".. considering folder $dir, compareFolderName=$compareFolderName"
	#if the folder is before the compareFolderName, which is $2 days before $PositionDate, then we have to delete $1.tgz
	if [[ "$dir" < "$compareFolderName" ]]; then
	  writeLog ".. the folder $dir will have the data for $1 DELETED"
	  if [ -f $dir/$1.tgz ]; then
	    rm $dir/$1.tgz >>${LOG_DIR}/${PositionDate}.log 2>>${LOG_DIR}/${PositionDate}.log
		 if [ $? != 0 ]; then 
		  writeErr "cleanUpArchive: unable to remove the Algo Data Archive $dir/$1.tgz per the data-retention schedule"
		  return 1
		 fi
	  fi
	else
	  writeLog ".. the folder will have the data for $1 saved"
	fi
	foldersFound=`expr $foldersFound + 1`
  done
  writeLog ".. considered $foldersFound folders"

  rm $ARCHIVE_DIR/files.txt >>${LOG_DIR}/${PositionDate}.log 2>>${LOG_DIR}/${PositionDate}.log
  writeLog "</cleanUpArchive>"
}

Open in new window

0
 

Author Comment

by:jkurant
ID: 39229649
I am baffled. I decided to just change the code in question, and see if that fixes the problem. So, I tried this new version of cleanUpArchive with a new find command, and I get the same results! It works for all except arareports.

cleanUpArchive() {
# Parameters:                                                                                                                                               
#   $1 = AlgoDataType                                                                                                                                   
#   $2 = Days to keep                                                                                                                                    
# This will go through all of the dated archive folders, and if the dated archive folder is older than $DaysToKeep,       
# then we will delete the file $AlgoDataType.tgz                                                                                               
  startDate=`date +%Y%m%d -d "$PositionDate -$2 days"`
  compareFolderName=$ARCHIVE_DIR/$startDate

  writeLog "<cleanUpArchive AlgoDataType=$1 DaysToKeep=$2 WillDeleteInFoldersBefore=$compareFolderName>"
  find $ARCHIVE_DIR -maxdepth 1 -type d -regex "$ARCHIVE_DIR/20.*" -printf %p@ | tr "@" "\n" | sort > $ARCHIVE_DIR/files.txt
   
  filesFound=`sort /opt/ris/archive/files.txt`
  writeLog ".. # of files found: ${#filesFound[@]}; filesFound=$filesFound"
  
  foldersDeleted=0
  foldersFound=0
  for dir in $filesFound; do
	writeLog ".. considering folder $dir, compareFolderName=$compareFolderName"
	#if the folder is before the compareFolderName, which is $2 days before $PositionDate, then we have to delete $1.tgz
	if [[ "$dir" < "$compareFolderName" ]]; then
	  writeLog ".. the folder $dir will have the data for $1 DELETED"
	  if [ -f $dir/$1.tgz ]; then
	    rm $dir/$1.tgz >>${LOG_DIR}/${PositionDate}.log 2>>${LOG_DIR}/${PositionDate}.log
		 if [ $? != 0 ]; then 
		  writeErr "cleanUpArchive: unable to remove the Algo Data Archive $dir/$1.tgz per the data-retention schedule"
		  return 1
		 fi
	  fi
	else
	  writeLog ".. the folder will have the data for $1 saved"
	fi
	foldersFound=`expr $foldersFound + 1`
  done
  writeLog ".. considered $foldersFound folders"

  rm $ARCHIVE_DIR/files.txt >>${LOG_DIR}/${PositionDate}.log 2>>${LOG_DIR}/${PositionDate}.log
  writeLog "</cleanUpArchive>"
}

Open in new window

0
What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

 
LVL 27

Expert Comment

by:skullnobrains
ID: 39231183
what is the output of the find command alone ?
what are the access rights on the directories ?
can you confirm there are no spaces in the path ?
0
 
LVL 38

Accepted Solution

by:
Gerwin Jansen, EE MVE earned 500 total points
ID: 39231593
I see that you use different ways of referring to variables' contents:

$ARCHIVE_DIR
$MD5Filename.new
${LOG_DIR}
${PositionDate}.log

I'd change all variables that don't have accolades {} to include them to prevent issues.

For debugging purposes, start your script with the -xv option to see debug logging and find out where things go wrong .

Btw: you have a lot of 'double' redirections like:

>>${LOG_DIR}/${PositionDate}.log 2>>${LOG_DIR}/${PositionDate}.log

You can simplify like this:

>>${LOG_DIR}/${PositionDate}.log 2&1

This will redirect stdout to the logfile and add stderror to the same.
0
 

Author Closing Comment

by:jkurant
ID: 39233466
This isn't really the answer to the problem, but did teach me something new, so I will have to accept it as the answer.

However, I have determined the actual answer myself, which is that I need to add | tr '\n' ' ' to convert the end-of-line characters to spaces. I don't know why this is necessary, though, because I have seen many examples online of taking the output of a find command and iterating over that with a for statement. It only worked intermittenly in my case, and I never found out what was making it work one time and not the other.
0
 
LVL 38

Expert Comment

by:Gerwin Jansen, EE MVE
ID: 39233493
Thanks, a minor correction:

>>${LOG_DIR}/${PositionDate}.log 2>&1

BTW: You don't have to accept a comment as an answer, you can wait a bit for other experts to respond or ask attention via a moderator using the Request Attention button above (you can still do that now if you like).
0
 

Author Comment

by:jkurant
ID: 39233552
Thank you, germinjansen. I have the answer I need now, so I will leave the solution as is.
0
 
LVL 27

Expert Comment

by:skullnobrains
ID: 39234353
i do not care about points here, but beware when using the tr method in this way : if any of the files has a space inside it, the script will mistaken it for 2 files. also if the find command outputs too many files (more than 4096, i guess in a recent bash), your script will not work.

many ways past this, but a simple one is to use a while loop

find ... | while read file
do
    ...
done

find itself also supports to -exec other commands iteratively, and also has sorting features in many implementations
0

Featured Post

Automating Terraform w Jenkins & AWS CodeCommit

How to configure Jenkins and CodeCommit to allow users to easily create and destroy infrastructure using Terraform code.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Fine Tune your automatic Updates for Ubuntu / Debian
In part one, we reviewed the prerequisites required for installing SQL Server vNext. In this part we will explore how to install Microsoft's SQL Server on Ubuntu 16.04.
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
In a previous video, we went over how to export a DynamoDB table into Amazon S3.  In this video, we show how to load the export from S3 into a DynamoDB table.
Suggested Courses

617 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question