The goal of the script is explained in the above reference. I integrated the suggested bash script into my driver script that compares files in a BASE tree with a TEST tree. In the above reference, we understood that it wasn't perfect in handling the edge cases pertaining to file sizes. For awhile, those edge cases have not been a problem. The script has already saved me today a good deal of tedium in manually comparing by hand.
Now the file generating program has created a different number of files in one TEST folder than in the corresponding BASE folder. My driver routine reported that no diffs were performed at all.
Do you think we can beef up the script so that it can handle a different number of files? In one particular scenario, there were 3 files in the TEST folder and only 2 in the BASE folder and to make things more complicated, none of the file sizes matched, and a pure sort by file size would produce an out-of-sync condition immediately since the new file only had a few bytes in it.
Now, the bulk of the folders showed minimal changes as desired (thanks again). And for the couple of folders where there were no diffs, I could manually confirm why there was an extra file. (It may also be that a TEST folder could have less files than the BASE.)
It seems that the complication as was discussed in the previous question is that diff trials have to be made to identify the nearest same files. Certainly file sizes is a good way to reduce the combination of comparisons. There are often just 2 or 3 files. Most I have seen are 6 and 7 files.
PART 3 (future) ...
Another complication that I was not dealing with until now (and could be another question in Part 3 if it is doable), is to acknowledge that sometimes, I expect a chunk of differences, where a chunk might be 20-40 lines of diff. Maybe I have to give hints to the script as to whether I am expecting differences (or maybe it is just an input diff line count threshold that I can use). It may also be the case that some folders in the tree should have no changes (i.e., a threshold of maybe 6 lines from the diff output), and other folders might have some files.
Thanks again for your help. I brought up Part 3+ just to identify areas that we should not worry about for now. If we can modify the script to allow for a different number of files and identifying the file pairs having the closest match, that alone seems to be an interesting challenge.